Directed evolution of novel binding proteins

ABSTRACT

In order to obtain a novel binding protein against a chosen target, DNA molecules, each encoding a protein comprising one of a family of similar potential binding domains and a structural signal calling for the display of the protein on the outer surface of a chosen bacterial cell, bacterial spore or phage (genetic package) are introduced into a genetic package. The protein is expressed and the potential binding domain is displayed on the outer surface of the package. The cells or viruses bearing the binding domains which recognize the target molecule are isolated and amplified. The successful binding domains are then characterized. One or more of these successful binding domains is used as a model for the design of a new family of potential binding domains, and the process is repeated until a novel binding domain having a desired affinity for the target molecule is obtained. In one embodiment, the first family of potential binding domains is related to bovine pancreatic trypsin inhibitor, the genetic package is M13 phage, and the protein includes the outer surface transport signal of the M13 gene III protein.

This application is a continuation-in-part of Ladner, Guterman, Roberts,and Markland, Ser. No. 07/487,063, filed Mar. 2, 1990, now abandoned,which is a continuation-in-part of Ladner and Guterman, Ser. No.07/240,160, filed Sep. 2, 1988, now abandoned. Ser. No. 07/487,063claimed priority under 35 U.S.C. 119 from PCT Application No.PCT/US89/03731, filed Sep. 1, 1989. All of the foregoing applicationsare hereby incorporated by reference.

CROSS-REFERENCE TO RELATED APPLICATIONS

The following related and commonly-owned applications are alsoincorporated by reference:

Robert Charles Ladner, Sonia Kosow Guterman, Rachael Baribault Kent, andArthur Charles Ley are named as joint inventors on U.S. Ser. No.07/293,980, now U.S. Pat. No. 5,096,815, filed Jan. 8, 1989, andentitled GENERATION AND SELECTION OF NOVEL DNA-BINDING PROTEINS ANDPOLYPEPTIDES. This application has been assigned to Protein EngineeringCorporation.

Robert Charles Ladner, Sonia Kosow Guterman, and Bruce Lindsay Robertsare named as a joint inventors on a U.S. Ser. No. 07/470,651 filed 26Jan. 1990, now abandoned, entitled "PRODUCTION OF NOVELSEQUENCE-SPECIFIC DNA-ALTERING ENZYMES", likewise assigned to ProteinEngineering Corp.

Ladner, Guterman, Kent, Ley, and Markland, Ser. No. 07/558,011 is alsoassigned to Protein Engineering Corporation.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to development of novel binding proteins(including mini-proteins) by an iterative process of mutagenesis,expression, chromatographic selection, and amplification. In thisprocess, a gene encoding a potential binding domain, said gene beingobtained by random mutagenesis of a limited number of predeterminedcodons, is fused to a genetic element which causes the resultingchimeric expression product to be displayed on the outer surface of avirus (especially a filamentous phage) or a cell. Chromatographicselection is then used to identify viruses or cells whose genomeincludes such a fused gene which coded for the protein which bound tothe chromatographic target.

2. Information Disclosure Statement

A. Protein Structure

The amino acid sequence of a protein determines its three-dimensional(3D) structure, which in turn determines protein function (EPST63,ANFI73). Shortle (SHOR85), Sauer and colleagues (PAKU86, REID88a), andCaruthers and colleagues (EISE85) have shown that some residues on thepolypeptide chain are more important than others in determining the 3Dstructure of a protein. The 3D structure is essentially unaffected bythe identity of the amino acids at some loci; at other loci only one ora few types of amino acid is allowed. In most cases, loci where widevariety is allowed have the amino acid side group directed toward thesolvent. Loci where limited variety is allowed frequently have the sidegroup directed toward other parts of the protein. Thus substitutions ofamino acids that are exposed to solvent are less likely to affect the 3Dstructure than are substitutions at internal loci. (See also SCHU79,p169-171 and CREI84, p239-245, 314-315).

The secondary structure (helices, sheets, turns, loops) of a protein isdetermined mostly by local sequence. Certain amino acids have apropensity to appear in certain "secondary structures," they will befound from time to time in other structures, and studies of pentapeptidesequences found in different proteins have shown that their conformationvaries considerably from one occurrence to the next (KABS84, ARGO87). Asa result, a priori design of proteins to have a particular 3D structureis difficult.

Several researchers have designed and synthesized proteins de novo(MOSE83, MOSE87, ERIC86). These designed proteins are small and mosthave been synthesized in vitro as polypeptides rather than genetically.Hecht et al. (HECH90) have produced a designed protein genetically.Moser, et al. state that design of biologically active proteins iscurrently impossible.

B. Protein Binding Activity

Many proteins bind non-covalently but very tightly and specifically tosome other characteristic molecules (SCHU79, CREI84). In each case thebinding results from complementarity of the surfaces that come intocontact: bumps fit into holes, unlike charges come together, dipolesalign, and hydrophobic atoms contact other hydrophobic atoms. Althoughbulk water is excluded, individual water molecules are frequently foundfilling space in intermolecular interfaces; these waters usually formhydrogen bonds to one or more atoms of the protein or to other boundwater. Thus proteins found in nature have not attained, nor do theyrequire, perfect complementarity to bind tightly and specifically totheir substrates. Only in rare cases is there essentially perfectcomplementarity; then the binding is extremely tight (as for example,avidin binding to biotin).

C. Protein Engineering

"Protein engineering" is the art of manipulating the sequence of aprotein in order to alter its binding characteristics. The factorsaffecting protein binding are known, (CHOT75, CHOT76, SCHU79, p98-107,and CREI84, Ch8), but designing new complementary surfaces has proveddifficult. Although some rules have been developed for substituting sidegroups (SUTC87b), the side groups of proteins are floppy and it isdifficult to predict what conformation a new side group will take.Further, the forces that bind proteins to other molecules are allrelatively weak and it is difficult to predict the effects of theseforces.

Recently, Quiocho and collaborators (QUIO87) elucidated the structuresof several periplasmic binding proteins from Gram-negative bacteria.They found that the proteins, despite having low sequence homology anddifferences in structural detail, have certain important structuralsimilarities. Based on their investigations of these binding proteins,Quiocho et al. suggest it is unlikely that, using current proteinengineering methods, proteins can be constructed with binding propertiessuperior to those of proteins that occur naturally.

Nonetheless, there have been some isolated successes. Wilkinson et al.(WILK84) reported that a mutant of the tyrosyl tRNA synthetase ofBacillus stearothermophilus with the mutation Thr₅₁ →Pro exhibits a100-fold increase in affinity for ATP. Tan and Kaiser (TANK77) andTschesche et al. (TSCH87) showed that changing a single amino acid inmini-protein greatly reduces its binding to trypsin, but that some ofthe mutants retained the parental characteristic of binding to aninhibiting chymotrypsin, while others exhibited new binding to elastase.Caruthers and others (EISE85) have shown that changes of single aminoacids on the surface of the lambda Cro repressor greatly reduce itsaffinity for the natural operator O_(R) 3, but greatly increase thebinding of the mutant protein to a mutant operator. Changing threeresidues in subtilisin from Bacillus amyloliquefaciens to be the same asthe corresponding residues in subtilisin from B. licheniformis produceda protease having nearly the same activity as the latter subtilisin,even though 82 amino acid sequence differences remained (WELL87a).Insertion of DNA encoding 18 amino acids (corresponding toPro-Glu-Dynorphin-Gly) into the E. coli phoA gene so that the additionalamino acids appeared within a loop of the alkaline phosphatase proteinresulted in a chimeric protein having both phoA and dynorphin activity(FREI90). Thus, changing the surface of a binding protein may alter itsspecificity without abolishing binding activity.

D. Techniques Of Mutagenesis

Early techniques of mutating proteins involved manipulations at theamino acid sequence level. In the semisynthetic method (TSCH87), theprotein was cleaved into two fragments, a residue removed from the newend of one fragment, the substitute residue added on in its place, andthe modified fragment joined with the other, original fragment.Alternatively, the mutant protein could be synthesized in its entirety(TANK77).

Erickson et al. suggested that mixed amino acid reagents could be usedto produce a family of sequence-related proteins which could then bescreened by affinity chromatography (ERIC86). They envision successiverounds of mixed synthesis of variant proteins and purification byspecific binding. They do not discuss how residues should be chosen forvariation. Because proteins cannot be amplified, the researchers mustsequence the recovered protein to learn which substitutions improvebinding. The researchers must limit the level of diversity so that eachvariety of protein will be present in sufficient quantity for theisolated fraction to be sequenced.

With the development of recombinant DNA techniques, it became possibleto obtain a mutant protein by mutating the gene encoding the nativeprotein and then expressing the mutated gene. Several mutagenesisstrategies are known. One, "protein surgery" (DILL87), involves theintroduction of one or more predetermined mutations within the gene ofchoice. A single polypeptide of completely predetermined sequence isexpressed, and its binding characteristics are evaluated.

At the other extreme is random mutagenesis by means of relativelynonspecific mutagens such as radiation and various chemical agents. SeeHo et al. (HOCJ85) and Lehtovaara, E.P. Appln. 285,123.

It is possible to randomly vary predetermined nucleotides using amixture of bases in the appropriate cycles of a nucleic acid synthesisprocedure. The proportion of bases in the mixture, for each position ofa codon, will determine the frequency at which each amino acid willoccur in the polypeptides expressed from the degenerate DNA population.Oliphant et al. (OLIP86) and Oliphant and Struhl (OLIP87) havedemonstrated ligation and cloning of highly degenerate oligonucleotides,which were used in the mutation of promoters. They suggested thatsimilar methods could be used in the variation of protein codingregions. They do not say how one should: a) choose protein residues tovary, or b) select or screen mutants with desirable properties.Reidhaar-Olson and Sauer (REID88a) have used synthetic degenerateoligo-nts to vary simultaneously two or three residues through alltwenty amino acids. See also Vershon et al. (VERS86a; VERS86b).Reidhaar-Olson and Sauer do not discuss the limits on how many residuescould be varied at once nor do they mention the problem of unequalabundance of DNA encoding different amino acids. They looked forproteins that either had wild-type dimerization or that did notdimerize. They did not seek proteins having novel binding properties anddid not find any. This approach is likewise limited by the number ofcolonies that can be examined (ROBE86).

To the extent that this prior work assumes that it is desirable toadjust the level of mutation so that there is one mutation per protein,it should be noted that many desirable protein alterations requiremultiple amino acid substitutions and thus are not accessible throughsingle base changes or even through all possible amino acidsubstitutions at any one residue.

D. Affinity Chromatography of Cells

Ferenci and coloborators have published a series of papers on thechromatographic isolation of mutants of the maltose-transport proteinLamB of E. coli (FERE82a, FERE82b, FERE83, FERE84, CLUN84, HEIN87 andpapers cited therein). The mutants were either spontaneous or inducedwith nonspecific chemical mutagens. Levels of mutagenesis were picked toprovide single point mutations or single insertions of two residues. Nomultiple mutations were sought or found.

While variation was seen in the degree of affinity for the conventionalLamB substrates maltose and starch, there was no selection for affinityto a target molecule not bound at all by native LamB, and no multiplemutations were sought or found. FERE84 speculated that the affinitychromatographic selection technique could be adapted to development ofsimilar mutants of other "important bacterial surface-located enzymes",and to selecting for mutations which result in the relocation of anintracellular bacterial protein to the cell surface. Ferenci's mutantsurface proteins would not, however, have been chimeras of a bacterialsurface protein and an exogenous or heterologous binding domain.

Ferenci also taught that there was no need to clone the structural gene,or to know the protein structure, active site, or sequence. The methodof the present invention, however, specifically utilizes a clonedstructural gene. It is not possible to construct and express a chimeric,outer surface-directed potential binding protein-encoding gene withoutcloning.

Ferenci did not limit the mutations to particular loci or particularsubstitutions. In the present invention, knowledge of the proteinstructure, active site and/or sequence is used as appropriate to predictwhich residues are most likely to affect binding activity without undulydestabilizing the protein, and the mutagenesis is focused upon thosesites. Ferenci does not suggest that surface residues should bepreferentially varied. In consequence, Ferenci's selection system ismuch less efficient than that disclosed herein.

E. Bacterial and Viral Expression of Chimeric Surface Proteins

A number of researchers have directed unmutated foreign antigenicepitopes to the surface of bacteria or phage, fused to a nativebacterial or phage surface protein, and demonstrated that the epitopeswere recognized by antibodies. Thus, Charbit, et al. (CHAR86)genetically inserted the C3 epitope of the VP1 coat protein ofpoliovirus into the LamB outer membrane protein of E. coli, anddetermined immunologically that the C3 epitope was exposed on thebacterial cell surface. Charbit, et al. (CHAR87) likewise producedchimeras of LamB and the A (or B) epitopes of the preS2 region ofhepatitis B virus.

A chimeric LacZ/OmpB protein has been expressed in E. coli and is,depending on the fusion, directed to either the outer membrane or theperiplasm (SILH77). A chimeric LacZ/OmpA surface protein has also beenexpressed and displayed on the surface of E. coli cells (Weinstock etal., WEIN83). Others have expressed and displayed on the surface of acell chimeras of other bacterial surface proteins, such as E. coli type1 fimbriae (Hedegaard and Klemm (HEDE89)) and Bacterioides nodusus type1 fimbriae (Jennings et al., JENN89). In none of the recited cases wasthe inserted genetic material mutagenized.

Dulbecco (DULB86) suggests a procedure for incorporating a foreignantigenic epitope into a viral surface protein so that the expressedchimeric protein is displayed on the surface of the virus in a mannersuch that the foreign epitope is accessible to antibody. In 1985 Smith(SMIT85) reported inserting a nonfunctional segment of the EcoRIendonuclease gene into gene III of bacteriophage f1, "in phase". Thegene III protein is a minor coat protein necessary for infectivity.Smith demonstrated that the recombinant phage were adsorbed byimmobilized antibody raised against the EcoRI endonuclease, and could beeluted with acid. De la Cruz et al. (DELA88) have expressed a fragmentof the repeat region of the circumsporozoite protein from Plasmodiumfalciparum on the surface of M13 as an insert in the gene III protein.They showed that the recombinant phage were both antigenic andimmunogenic in rabbits, and that such recombinant phage could be usedfor B epitope mapping. The researchers suggest that similar recombinantphage could be used for T epitope mapping and for vaccine development.

None of these researchers suggested mutagenesis of the insertedmaterial, nor is the inserted material a complete binding domainconferring on the chimeric protein the ability to bind specifically to areceptor other than the antigen combining site of an antibody.

McCafferty et al. (MCCA90) expressed a fusion of an Fv fragment of anantibody to the N-terminal of the pIII protein. The Fv fragment was notmutated.

F. Epitope Libraries on Fusion Phage

Parmley and Smith (PARM88) suggested that an epitope library thatexhibits all possible hexapeptides could be constructed and used toisolate epitopes that bind to antibodies. In discussing the epitopelibrary, the authors did not suggest that it was desirable to balancethe representation of different amino acids. Nor did they teach that theinsert should encode a complete domain of the exogenous protein.Epitopes are considered to be unstructured peptides as opposed tostructured proteins.

After the filing of the parent application whose benefit is claimedherein under 35 U.S.C. 120, certain groups reported the construction of"epitope libraries." Scott and Smith (SCOT90) and Cwirla et al. (CWIR90)prepared "epitope libraries" in which potential hexapeptide epitopes fora target antibody were randomly mutated by fusing degenerateoligonucleotides, encoding the epitopes, with gene III of fd phage, andexpressing the fused gene in phage-infected cells. The cellsmanufactured fusion phage which displayed the epitopes on their surface;the phage which bound to immobilized antibody were eluted with acid andstudied. In both cases, the fused gene featured a segment encoding aspacer region to separate the variable region from the wild type pIIIsequence so that the varied amino acids would not be constrained by thenearby pIII sequence. Devlin et al. (DEVL90) similarly screened, usingM13 phage, for random 15 residue epitopes recognized by streptavidin.Again, a spacer was used to move the random peptides away from the restof the chimeric phage protein. These references therefore taught awayfrom constraining the conformational repertoire of the mutated residues.

Another problem with the Scott and Smith, Cwirla et al., and Devlin etal., libraries was that they provided a highly biased sampling of thepossible amino acids at each position. Their primary concern indesigning the degenerate oligonucleotide encoding their variable regionwas to ensure that all twenty amino acids were encodible at eachposition; a secondary consideration was minimizing the frequency ofoccurrence of stop signals. Consequently, Scott and Smith and Cwirla etal. employed NNK (N=equal mixture of G, A, T, C; K=equal mixture of Gand T) while Devlin et al. used NNS (S=equal mixture of G and C). Therewas no attempt to minimize the frequency ratio of most favored-to-leastfavored amino acid, or to equalize the rate of occurrence of acidic andbasic amino acids.

Devlin et al. characterized several affinity-selectedstreptavidin-binding peptides, but did not measure the affinityconstants for these peptides. Cwirla et al. did determine the affinityconstant for his peptides, but were disappointed to find that his besthexapeptides had affinities (350-300 nM), "orders of magnitude" weakerthan that of the native Met-enkephalin epitope (7 nM) recognized by thetarget antibody. Cwirla et al. speculated that phage bearing peptideswith higher affinities remained bound under acidic elution, possiblybecause of multivalent interactions between phage (carrying about 4copies of pIII) and the divalent target IgG. Scott and Smith were ableto find peptides whose affinity for the target antibody (A2) wascomparable to that of the reference myohemerythrin epitope (50 nM).However, Scott and Smith likewise expressed concern that somehigh-affinity peptides were lost, possibly through irreversible bindingof fusion phage to target. G. Non-Commonly Owned Patents andApplications Naming Robert Ladner as an Inventor

Ladner, U.S. Pat. No. 4,704,692, "Computer Based System and Method forDetermining and Displaying Possible Chemical Structures for ConvertingDouble- or Multiple-Chain Polypeptides to Single-Chain Polypeptides"describes a design method for converting proteins composed of two ormore chains into proteins of fewer polypeptide chains, but withessentially the same 3D structure. There is no mention of variegated DNAand no genetic selection. Ladner and Bird, WO88/01649 (Publ. Mar. 10,1988) disclose the specific application of computerized design of linkerpeptides to the preparation of single chain antibodies.

Ladner, Glick, and Bird, WO88/06630 (publ. 7 Sep. 1988 and havingpriority from U.S. application Ser. No. 07/021,046, assigned to GenexCorp.) (LGB) speculate that diverse single chain antibody domains (SCAD)may be screened for binding to a particular antigen by varying the DNAencoding the combining determining regions of a single chain antibody,subcloning the SCAD gene into the gpV gene of phage lambda so that aSCAD/gpV chimera is displayed on the outer surface of phage lambda, andselecting phage which bind to the antigen through affinitychromatography. The only antigen mentioned is bovine growth hormone. Noother binding molecules, targets, carrier organisms, or outer surfaceproteins are discussed. Nor is there any mention of the method or degreeof mutagenesis. Furthermore, there is no teaching as to the exactstructure of the fusion nor of how to identify a successful fusion orhow to proceed if the SCAD is not displayed.

Ladner and Bird, WO88/06601 (publ. 7 Sep. 1988) suggest that singlechain "pseudodimeric" repressors (DNA-binding proteins) may be preparedby mutating a putative linker peptide followed by in vivo selection thatmutation and selection may be used to create a dictionary of recognitionelements for use in the design of asymmetric repressors. The repressorsare not displayed on the outer surface of an organism.

Methods of identifying residues in protein which can be replaced with acysteine in order to promote the formation of a protein-stabilizingdisulfide bond are given in Pantoliano and Ladner, U.S. Pat. No.4,903,773 (PANT90), Pantoliano and Ladner (PANT87), Pabo and Suchenek(PABO86), MATS89, and SAUE86.

No admission is made that any cited reference is prior art or pertinentprior art, and the dates given are those appearing on the reference andmay not be identical to the actual publication date. All referencescited in this specification are hereby incorporated by reference.

SUMMARY OF THE INVENTION

The present invention is intended to overcome the deficiencies discussedabove. It relates to the construction, expression, and selection ofmutated genes that specify novel proteins with desirable bindingproperties, as well as these proteins themselves. The substances boundby these proteins, hereinafter referred to as "targets", may be, butneed not be, proteins. Targets may include other biological or syntheticmacromolecules as well as other organic and inorganic substances.

The fundamental principle of the invention is one of forced evolution.In nature, evolution results from the combination of genetic variation,selection for advantageous traits, and reproduction of the selectedindividuals, thereby enriching the population for the trait. The presentinvention achieves genetic variation through controlled randommutagenesis ("variegation") of DNA, yielding a mixture of DNA moleculesencoding different but related potential binding proteins. It selectsfor mutated genes that specify novel proteins with desirable bindingproperties by 1) arranging that the product of each mutated gene bedisplayed on the outer surface of a replicable genetic package (GP) (acell, spore or virus) that contains the gene, and 2) using affinityselection--selection for binding to the target material--to enrich thepopulation of packages for those packages containing genes specifyingproteins with improved binding to that target material. Finally,enrichment is achieved by allowing only the genetic packages which, byvirtue of the displayed protein, bound to the target, to reproduce. Theevolution is "forced" in that selection is for the target materialprovided.

The display strategy is first perfected by modifying a genetic packageto display a stable, structured domain (the "initial potential bindingdomain", IPBD) for which an affinity molecule (which may be an antibody)is obtainable. The success of the modifications is readily measured by,e.g., determining whether the modified genetic package binds to theaffinity molecule.

The IPBD is chosen with a view to its tolerance for extensivemutagenesis. Once it is known that the IPBD can be displayed on asurface of a package and subjected to affinity selection, the geneencoding the IPBD is subjected to a special pattern of multiplemutagenesis, here termed "variegation", which after appropriate cloningand amplification steps leads to the production of a population ofgenetic packages each of which displays a single potential bindingdomain (a mutant of the IPBD), but which collectively display amultitude of different though structurally related potential bindingdomains (PBDs). Each genetic package carries the version of the pbd genethat encodes the PBD displayed on the surface of that particularpackage. Affinity selection is then used to identify the geneticpackages bearing the PBDs with the desired binding characteristics, andthese genetic packages may then be amplified. After one or more cyclesof enrichment by affinity selection and amplification, the DNA encodingthe successful binding domains (SBDs) may then be recovered fromselected packages.

If need be, the DNA from the SBD-bearing packages may then be further"variegated", using an SBD of the last round of variegation as the"parental potential binding domain" (PPBD) to the next generation ofPBDs, and the process continued until the worker in the art is satisfiedwith the result. At that point, the SBD may be produced by anyconventional means, including chemical synthesis.

When the number of different amino acid sequences obtainable by mutationof the domain is large when compared to the number of different domainswhich are displayable in detectable amounts, the efficiency of theforced evolution is greatly enhanced by careful choice of which residuesare to be varied. First, residues of a known protein which are likely toaffect its binding activity (e.g., surface residues) and not likely tounduly degrade its stability are identified. Then all or some of thecodons encoding these residues are varied simultaneously to produce avariegated population of DNA. The variegated population of DNA is usedto express a variety of potential binding domains, whose ability to bindthe target of interest may then be evaluated.

The method of the present invention is thus further distinguished fromother methods in the nature of the highly variegated population that isproduced and from which novel binding proteins are selected. We forcethe displayed potential binding domain to sample the nearby "sequencespace" of related amino-acid sequences in an efficient, organizedmanner. Four goals guide the various variegation plans used herein,preferably: 1) a very large number (e.g. 10⁷) of variants is available,2) a very high percentage of the possible variants actually appears indetectable amounts, 3) the frequency of appearance of the desiredvariants is relatively uniform, and 4) variation occurs only at alimited number of amino-acid residues, most preferably at residueshaving side groups directed toward a common region on the surface of thepotential binding domain.

This is to be distinguished from the simple use of indiscriminatemutagenic agents such as radiation and hydroxylamine to modify a gene,where there is no (or very oblique) control over the site of mutation.Many of the mutations will affect residues that are not a part of thebinding domain. Moreover, since at a reasonable level of mutagenesis,any modified codon is likely to be characterized by a single basechange, only a limited and biased range of possibilities will beexplored. Equally remote is the use of site-specific mutagenesistechniques employing mutagenic oligonucleotides of nonrandomizedsequence, since these techniques do not lend themselves to theproduction and testing of a large number of variants. While focusedrandom mutagenesis techniques are known, the importance of controllingthe distribution of variation has been largely overlooked.

In order to obtain the display of a multitude of different thoughrelated potential binding domains, applicants generate a heterogeneouspopulation of replicable genetic packages each of which comprises ahybrid gene including a first DNA sequence which encodes a potentialbinding domain for the target of interest and a second DNA sequencewhich encodes a display means, such as an outer surface protein nativeto the genetic package but not natively associated with the potentialbinding domain (or the parental binding domain to which it is related)which causes the genetic package to display the corresponding chimericprotein (or a processed form thereof) on its outer surface.

It should be recognized that by expressing a hybrid protein whichcomprises an outer surface transport signal not natively associated withthe binding domain, the utility of the present invention is greatlyextended. The binding domain need not be that of a surface protein ofthe genetic package (or, in the case of a viral package, of its hostcell), since the provided outer surface transport signal is responsiblefor achieving the desired display. Thus, it is possible to display onthe surface of a phage, bacterial cell or bacterial spore a bindingdomain related to the binding domain of a normally cytoplasmic bindingprotein, or the binding domain of eukaryotic protein which is not foundon the surface of prokaryotic cells or viruses.

Another important aspect of the invention is that each potential bindingdomain remains physically associated with the particular DNA moleculewhich encodes it. Thus, once successful binding domains are identified,one may readily recover the gene and either express additionalquantities of the novel binding protein or further mutate the gene. Theform that this association takes is a "replicable genetic package", avirus, cell or spore which replicates and expresses the bindingdomain-encoding gene, and transports the binding domain to its outersurface.

It is also possible chemically or enzymatically to modify the PBDsbefore selection. The selection then identifies the best modified aminoacid sequence. For example, we could treat the variegated population ofgenetic packages that display a variegated population of binding domainswith a protein tyrosine kinase and then select for binding the target.Any tyrosines on the BD surface will be phosphorylated and this couldaffect the binding properties. Other chemical or enzymatic modificationsare possible.

By virtue of the present invention, proteins are obtained which can bindspecifically to targets other than the antigen-combining sites ofantibodies. A protein is not to be considered a "binding protein" merelybecause it can be bound by an antibody (see definition of "bindingprotein" which follows). While almost any amino acid sequence of morethan about 6-8 amino acids is likely, when linked to an immunogeniccarrier, to elicit an immune response, any given random polypeptide isunlikely to satisfy the stringent definition of "binding protein" withrespect to minimum affinity and specificity for its substrate. It isonly by testing numerous random polypeptides simultaneously (and, in theusual case, controlling the extent and character of the sequencevariation, i.e., limiting it to residues of a potential binding domainhaving a stable structure, the residues being chosen as more likely toaffect binding than stability) that this obstacle is overcome.

In one embodiment, the invention relates to:

a) preparing a variegated population of replicable genetic packages,each package including a nucleic acid construct coding for anouter-surface-displayed potential binding protein other than anantibody, comprising (i) a structural signal directing the display ofthe protein (or a processed form thereof) on the outer surface of thepackage and (ii) a potential binding domain for binding said target,where the population collectively displays a multitude of differentpotential binding domains having a substantially predetermined range ofvariation in sequence,

b) causing the expression of said protein and the display of saidprotein on the outer surface of such packages,

c) contacting the packages with target material, other than an antibodywith an exposed antigen-combining site, so that the potential bindingdomains of the proteins and the target material may interact, andseparating packages bearing a potential binding domain that succeeds inbinding the target material from packages that do not so bind,

d) recovering and replicating at least one package bearing a successfulbinding domain,

e) determining the amino acid sequence of the successful binding domainof a genetic package which bound to the target material,

f) preparing a new variegated population of replicable genetic packagesaccording to step (a), the parental potential binding domain for thepotential binding domains of said new packages being a successfulbinding domain whose sequence was determined in step (e), and repeatingsteps (b)-(e) with said new population, and, when a package bearing abinding domain of desired binding characteristics is obtained,

g) abstracting the DNA encoding the desired binding domain from thegenetic package and placing it into a suitable expression system. (Thebinding domain may then be expressed as a unitary protein, or as adomain of a larger protein).

The invention is not, however, limited to proteins with a single BDsince the method may be applied to any or all of the BDs of the protein,sequentially or simultaneously. The invention is not, however, limitedto biological synthesis of the binding domains; peptides having anamino-acid sequence determined by the isolated DNA can be chemicallysynthesized.

The invention further relates to a variegated population of geneticpackages. Said population may be used by one user to select for bindingto a first target, by a second user to select for binding to a secondtarget, and so on, as the present invention does not require that theinitial potential binding domain actually bind to the target ofinterest, and the variegation is at residues likely to affect binding.The invention also relates to the variegated DNA used in preparing suchgenetic packages.

The invention likewise encompasses the procedure by which the displaystrategy is verified. The genetic packages are engineered to display asingle IPBD sequence. (Variability may be introduced into DNAsubsequences adjacent to the ipbd subsequence and within the osp-ipbdgene so that the IPBD will appear on the GP surface.) A molecule, suchas an antibody, having high affinity for correctly folded IPBD is usedto: a) detect IPBD on the GP surface, b) screen colonies for display ofIPBD on the GP surface, or c) select GPs that display IPBD from apopulation, some members of which might display IPBD on the GP surface.In one preferred embodiment, this verification process (part I)involves:

1) choosing a GP such as a bacterial cell, bacterial spore, or phage,having a suitable outer surface protein (OSP),

2) choosing a stable IPBD,

3) designing an amino acid sequence that: a) includes the IPBD as asubsequence and b) will cause the IPBD to appear on the GP surface,

4) engineering a gene, denoted osp-ipbd, that: a) codes for the designedanimo acid sequence, b) provides the necessary genetic regulation, andc) introduces convenient sites for genetic manipulation,

5) cloning the osp-ipbd gene into the GP, and

6) harvesting the transformed GPs and testing them for presence of IPBDon the GP surface; this test is performed with an affinity moleculehaving high affinity for IPBD, denoted AfM(IPBD).

Once a GP(IPBD) is produced, it can be used many times as the startingpoint for developing different novel proteins that bind to a variety ofdifferent targets. The knowledge of how we engineer the appearance ofone IPBD on the surface of a GP can be used to design and produce otherGP(IPBD)s that display different IPBDs.

Knowing that a particular genetic package and osp-ipbd fusion aresuitable for the practice of the invention, we may variegate the geneticpackages and select for binding to a target of interest. Using IPBD asthe PPBD to the first cycle of variegation, we prepare a wide variety ofosp-pbd genes that encode a wide variety of PBDs. We use an affinityseparation to enrich the population of GP(vgPBD)s for GPs that displayPBDs with binding properties relative to the target that are superior tothe binding properties of the PPBD. An SBD selected from one variegationcycle becomes the PPBD to the next variegation cycle. In a preferredembodiment, Part II of the process of the present invention involves:

1) picking a target molecule, and an affinity separation system whichselects for proteins having an affinity for that target molecule,

2) picking a GP(IPBD),

3) picking a set of several residues in the PPBD to vary; the principalindicators of which residues to vary include: a) the 3D structure of theIPBD, b) sequences of homologous proteins, and c) computer ortheoretical modeling that indicates which residues can toleratedifferent amino acids without disrupting the underlying structure,

4) picking a subset of the residues picked in Part II.3, to be variedsimultaneously; the principal considerations are the number of differentvariants and which variants are within the detection capabilities of theaffinity separation system, and setting the range of variation;

5) implementing the variegation by:

a) synthesizing the part of the osp-pbd gene that encodes the residuesto be varied using a specific mixture of nucleotide substrates for someor all of the bases encoding residues slated for variation, therebycreating a population of DNA molecules, denoted vgDNA,

b) ligating this vgDNA, by standard methods, into the operative cloningvector (OCV) (e.g. a plasmid or bacteriophage),

c) using the ligated DNA to transform cells, thereby producing apopulation of transformed cells,

d) culturing (i.e. increasing in number) the population of transformedcells and harvesting the population of GP(PBD)s, said population beingdenoted as GP(vgPBD),

e) enriching the population for GPs that bind the target by usingaffinity separation, with the chosen target molecule as affinitymolecule,

f) repeating steps II.5.d and II.5.e until a GP(SBD) having improvedbinding to the target is isolated, and

g) testing the isolated SBD or SBDs for affinity and specificity for thechosen target,

6) repeating steps II.3, II.4, and II.5 until the desired degree ofbinding is obtained.

Part II is repeated for each new target material. Part I need berepeated only if no GP(IPBD) suitable to a chosen target is available.

For each target, there are a large number of SBDs that may be found bythe method of the present invention. The process relies on a combinationof protein structural considerations, probabilities, and targetedmutations with accumulation of information. To increase the probabilitythat some PBD in the population will bind to the target, we generate aslarge a population as we can conveniently subject toselection-through-binding in one experiment. Key questions in managementof the method are "How many transformants can we produce?", and "Howsmall a component can we find through selection-through-binding?". Theoptimum level of variegation is determined by the maximum number oftransformants and the selection sensitivity, so that for any reasonablesensitivity we may use a progressive process to obtain a series ofproteins with higher and higher affinity for the chosen target material.

The appended claims are hereby incorporated by reference into thisspecification as an enumeration of the preferred embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows how a phage may be used as a genetic package. At (a) wehave a wild-type precoat protein lodged in the lipid bilayer. The signalpeptide is in the periplasmic space. At (b), a chimeric precoat protein,with a potential binding domain interposed between the signal peptideand the mature coat protein sequence, is similarly trapped. At (c) and(d), the signal peptide has been cleaved off the wild-type and chimericproteins, respectively, but certain residues of the coat proteinsequence interact with the lipid bilayer to prevent the mature proteinfrom passing entirely into the periplasm. At (e) and (f), maturewild-type and chimeric protein are assembled into the coat of a singlestranded DNA phage as it emerges into the periplasmic space. The phagewill pass through the outer membrane into the medium where it can berecovered and chromatographically evaluated.

FIG. 2 depicts (a) the optimal stereochemistry of a disulfide bond,based on Creighton, "Disulfide Bonds and Protein Stability" (CREI88)(the two possible torsion angles about the disulfide bond of +90° and-90° are equally likely), and (b) the standard geometric parameters forthe disulfide bond, following Katz and Kossiakoff (KATZ86). The averageCα-Cα distance is 5-6 Å, and the typical S--S bond length is ≈2.0 Å.Many left-hand disulfides adopt as a preferred geometry X1=-60°,X2=-60°, X3=-85°, X2'=-60°, X1'=-60°, Cα-Cα=5.88 Å; right-handdisulfides are more variable.

FIG. 3 shows a mini-protein comprising eight residues, numbered 4through 11 and in which residues 5 and 10 are joined by a disulfide. Theβ carbons are labeled for residues 4, 6, 7, 8, 9, and 11; these residuesare preferred sites of variegation.

FIG. 4 shows the C.sub.α of the coat protein of phage f1.

FIG. 5 shows the construction of M13-MB51.

FIG. 6 shows construction of MK-BPTI, also known as BPTI-III MK.

FIG. 7 illustrates fractionation of the Mini PEPI library on HNE beads.The abscissae shows pH of buffer. The ordinants show amount of phage (asfraction of input phage) obtained at given pH. Ordinants scaled by 10³.

FIG. 8 illustrates fractionation of the MYMUT PEPI library on HNE beads.The abscissae shows pH of buffer. The ordinants show amount of phage (asfraction of input phage) obtained at given pH. Ordinants scaled by 10³.

FIG. 9 shows the elution profiles for EpiNE clones 1, 3, and 7. Eachprofile is scaled so that the peak is 1.0 to emphasize the shape of thecurve.

FIG. 10 shows pH profile for the binding of BPTI-III MK and EpiNE1 oncathepsin G beads. The abscissae shows pH of buffer. The ordinants showamount of phage (as fraction of input phage) obtained at given pH.Ordinants scaled by 10³.

FIG. 11 shows pH profile for the fraxctionation of the MYMUT Library oncathepsin G beads. The abscissae shows pH of buffer. The ordinants showamount of phage (as fraction of input phage) obtained at given pH.Ordinants scaled by 10³.

FIG. 12 shows a second fractionation of MYMUT library over cathepsin G.

FIG. 13 shows elution profiles on immobilized cathepsin G for phageselected for binding to cathepsin G.

FIG. 14 shows the C.sub.α s of BPTI and interaction set #2.

FIG. 15 shows the main chain of scorpion toxin (Brookhaven Protein DataBank entry 1SN3) residues 20 through 42. CYS₂₅ and CYS₄₁ are shownforming a disulfide. In the native protein these groups form disulfidesto other cysteines, but no main-chain motion is required to bring thegamma sulphurs into acceptable geometry. Residues, other than GLY, arelabeled at the β carbon with the one-letter code.

FIG. 16 shows profiles of the elustion of phage that display EpiNE7 andEpiNE7.23 from HNE beads.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS OVERVIEW

I. DEFINITIONS AND ABBREVIATIONS

II. THE INITIAL POTENTIAL BINDING DOMAIN

A. Generally

B. Influence of Target Size on Choice of IPBD

C. Influence of Target Charge on Choice of IPBD

D. Other Considerations in the Choice of IPBD

E. Bovine Pancreatic Trypsin Inhibitor (BPTI) as an IPBD

F. Mini-Proteins as IPBDs

G. Modified PBDs

III. VARIEGATION STRATEGY - MUTAGENESIS TO OBTAIN POTENTIAL BINDINGDOMAINS WITH DESIRED DIVERSITY

A. Generally

B. Identification of Residues to be Varied

C. Determining the Substitution Set for Each Parental Residue

D. Special Considerations Relating to Variegation of Mini-Proteins withEssential Cysteines

E. Planning the Second and Later Rounds of Variegation

IV. DISPLAY STRATEGY - DISPLAYING FOREIGN BINDING DOMAINS ON THE SURFACEOF A "GENETIC PACKAGE"

A. General Requirements for Genetic Package

B. Phages for Use as Genetic Packages

C. Bacterial Cells as Genetic Packages

D. Bacterial Spores as Genetic Packages

E. Artificial Outer Surface Protein

F. Designing the osp::ipbd Gene Insert

G. Synthesis of Gene Inserts

H. Operative Cloning Vector

I. Transformation of Cells

J. Verification of Display Strategy

K. Analysis and Correction of Display Problems

V. AFFINITY SELECTION OF TARGET-BINDING MUTANTS

A. Affinity Separation Technology, Generally

B. Affinity Chromatography, Generally

C. Fluorescent-Activated Cell Sorting, Generally

D. Affinity Electrophoresis, Generally

E. Target Materials

F. Immobilization or Labeling of Target Material

G. Elution of Lower Affinity PBD-Bearing Packages

H. Optimization of Affinity Separation

I. Measuring the Sensitivity of Affinity Separation

J. Measuring the Efficiency of Separation

K. Reducing Selection due to Non-Specific Binding

L. Isolation of Genetic Package PBDs with Binding-to-Target Phenotypes

M. Recovery of Packages

N. Amplifying the Enriched Packages

O. Determining Whether Further Enrichment is Needed

P. Characterizing the Putative SBDs

Q. Joint Selections

R. Selection for Non-Binding

S. Selection of Potential Binding Domains for Retention of Structure

T. Engineering of Antagonists

VI. EXPLOITATION OF SUCCESSFUL BINDING DOMAINS AND CORRESPONDING DNAS

A. Generally

B. Production of Novel Binding Proteins

C. Mini-Protein Production

D. Uses of Novel Binding Proteins

VII. EXAMPLES

I. DEFINITIONS AND ABBREVIATIONS

Let Kd (x,y) be a dissociation constant, ##EQU1## For the purposes ofthe appended claims, a protein P is a binding protein if (1) For onemolecular, ionic or atomic species A, other than the variable domain ofan antibody, the dissociation constant K_(D) (P,A) <10⁻⁶ moles/liter(preferably, <10⁻⁷ moles/liter), and (2) for a different molecular,ionic or atomic species B, K_(D) (P,B) >10⁻⁴ moles/liter (preferably,>10⁻¹ moles/liter). As a result of these two conditions, the protein Pexhibits specificity for A over B, and a minimum degree of affinity (oravidity) for A.

The exclusion of "variable domain of an antibody" in (1) above isintended to make clear that for the purposes herein a protein is not tobe considered a "binding protein" merely because it is antigenic.However, an antigen may nonetheless qualify as a binding protein becauseit specifically binds to a substance other than an antibody, e.g., anenzyme for its substrate, or a hormone for its cellular receptor.Additionally, it should be pointed out that "binding protein" mayinclude a protein which binds specifically to the Fc of an antibody,e.g., staphylococcal protein A.

Normally, the binding protein will not be an antibody or aantigen-binding derivative thereof. An antibody is a crosslinked complexof four polypeptides (two heavy and two light chains). The light chainsof IgG have a molecular weight of ≈23,000 daltons and the heavy chainsof ≈53,000 daltons. A single binding unit is composed of the variableregion of a heavy chain (V_(H)) and the variable region of a light chain(V_(L)), each about 110 amino-acid residues. The V_(H) and V_(L) regionsare held in proximity by a disulfide bond between the adjoining C_(L)and C_(H1) regions; altogether, these total 440 residues and correspondto an Fab fragment. Derivatives of antibodies include Fab fragments andthe individual variable light and heavy domains. A special case ofantibody derivative is a "single chain antibody." A "single-chainantibody" is a single chain polypeptide comprising at least 200 aminoacids, said amino acids forming two antigen-binding regions connected bya peptide linker that allows the two regions to fold together to bindthe antigen in a manner akin to that of an Fab fragment. Either the twoantigen-binding regions must be variable domains of known antibodies, orthey must (1) each fold into a β barrel of nine strands that arespatially related in the same way as are the nine strands of knownantibody variable light or heavy domains, and (2) fit together in thesame way as do the variable domains of said known antibody. Generallyspeaking, this will require that, with the exception of the amino acidscorresponding to the hypervariable region, there is at least 88%homology with the amino acids of the variable domain of a knownantibody.

While the present invention may be used to develop novel antibodiesthrough variegation of codons corresponding to the hypervariable regionof an antibody's variable domain, its primary utility resides in thedevelopment of binding proteins which are not antibodies or evenvariable domains of antibodies. Novel antibodies can be obtained byimmunological techniques; novel enzymes, hormones, etc. cannot.

It will be appreciated that, as a result of evolution, theantigen-binding domains of antibodies have acquired a structure whichtolerates great variability of sequence in the hypervariable regions.The remainder of the variable domain is made up of constant regionsforming a distinctive structure, a nine strand β barrel, which hold thehypervariable regions (inter-strand loops) in a fixed relationship witheach other. Most other binding proteins lack this molecular design whichfacilitates diversification of binding characteristics. Consequently,the successful development of novel antibodies by modification ofsequences encoding known hypervariable regions--which, in nature, varyfrom antibody to antibody--does not provide any guidance or assurance ofsuccess in the development of novel, non-immunoglobulin bindingproteins.

It should further be noted that the affinity of antibodies for theirtarget epitopes is typically on the order of 10⁶ to 10¹⁰ liters/mole;many enzymes exhibit much greater affinities (10⁹ to 10¹⁵ liters/mole)for their preferred substrates. Thus, if the goal is to develop abinding protein with a very high affinity for a target of interest,e.g., greater than 10¹⁰, the antibody design may in fact be undulylimiting. Furthermore, the complementarity-determining residues of anantibody comprises many residues, 30 to 50. In most cases, it is notknown which of these residues participates directly in binding antigen.Thus, picking an antibody as PPBD does not allow us to focus variegationto a small number of residues.

Most larger proteins fold into distinguishable globules called domains(ROSS81). Protein domains have been defined various ways, but alldefinitions fall into one of three classes: a) those that define adomain in terms of 3D atomic coordinates, b) those that define a domainas an isolable, stable fragment of a larger protein, and c) those thatdefine a domain based on protein sequence homology plus a method fromclass a) or b). Frequently, different methods of defining domainsapplied to a single protein yield identical or very similar domainboundaries. The diversity of definitions for domains stems from the manyways that protein domains are perceived to be important, including theconcept of domains in predicting the boundaries of stable fragments, andthe relationship of domains to protein folding, function, stability, andevolution. The present invention emphasizes the retention of thestructured character of a domain even though its surface residues aremutated. Consequently, definitions of "domain" which emphasizestability--retention of the overall structure in the face of perturbingforces such as elevated temperatures or chaotropic agents--are favored,though atomic coordinates and protein sequence homology are notcompletely ignored.

When a domain of a protein is primarily responsible for the protein'sability to specifically bind a chosen target, it is referred to hereinas a "binding domain" (BD). A preliminary operation is to engineer theappearance of a stable protein domain, denoted as an "initial potentialbinding domain" (IPBD), on the surface of a genetic package.

The term "variegated DNA" (vgDNA) refers to a mixture of DNA moleculesof the same or similar length which, when aligned, vary at some codonsso as to encode at each such codon a plurality of different amino acids,but which encode only a single amino acid at other codon positions. Itis further understood that in variegated DNA, the codons which arevariable, and the range and frequency of occurrence of the differentamino acids which a given variable codon encodes, are determined inadvance by the synthesizer of the DNA, even though the synthetic methoddoes not allow one to know, a priori, the sequence of any individual DNAmolecule in the mixture. The number of designated variable codons in thevariegated DNA is preferably no more than 20 codons, and more preferablyno more than 5-10 codons. The mix of amino acids encoded at eachvariable codon may differ from codon to codon.

A population of genetic packages into which variegated DNA has beenintroduced is likewise said to be "variegated".

For the purposes of this invention, the term "potential binding protein"refers to a protein encoded by one species of DNA molecule in apopulation of variegated DNA wherein the region of variation appears inone or more subsequences encoding one or more segments of thepolypeptide having the potential of serving as a binding domain for thetarget substance.

From time to time, it may be helpful to speak of the "parent sequence"of the variegated DNA. When the novel binding domain sought is ananalogue of a known binding domain, the parent sequence is the sequencethat encodes the known binding domain. The variegated DNA will beidentical with this parent sequence at one or more loci, but willdiverge from it at chosen loci. When a potential binding domain isdesigned from first principles, the parent sequence is a sequence whichencodes the amino acid sequence that has been predicted to form thedesired binding domain, and the variegated DNA is a population of"daughter DNAs" that are related to that parent by a recognizablesequence similarity.

A "chimeric protein" is a fusion of a first amino acid sequence(protein) with a second amino acid sequence defining a domain foreign toand not substantially homologous with any domain of the first protein. Achimeric protein may present a foreign domain which is found (albeit ina different protein) in an organism which also expresses the firstprotein, or it may be an "interspecies", "intergeneric", etc. fusion ofprotein structures expressed by different kinds of organisms.

One amino acid sequence of the chimeric proteins of the presentinvention is typically derived from an outer surface protein of a"genetic package" as hereafter defined. The second amino acid sequenceis one which, if expressed alone, would have the characteristics of aprotein (or a domain thereof) but is incorporated into the chimericprotein as a recognizable domain thereof. It may appear at the amino orcarboxy terminal of the first amino acid sequence (with or without anintervening spacer), or it may interrupt the first amino acid sequence.The first amino acid sequence may correspond exactly to a surfaceprotein of the genetic package, or it may be modified, e.g., tofacilitate the display of the binding domain.

In the present invention, the words "select" and "selection" are used inthe genetic sense; i.e., a biological process whereby a phenotypiccharacteristic is used to enrich a population for those organismsdisplaying the desired phenotype.

One affinity separation is called a "separation cycle"; one pass ofvariegation followed by as many separation cycles as are needed toisolate an SBD, is called a "variegation cycle". The amino acid sequenceof one SBD from one round becomes the PPBD to the next variegationcycle. We perform variegation cycles iteratively until the desiredaffinity and specificity of binding between an SBD and chosen target areachieved.

The following abbreviations will be used throughout the presentspecification:

    ______________________________________                                        Abbreviation   Meaning                                                        ______________________________________                                        GP             Genetic Package, e.g. a                                                       bacteriophage                                                  wtGP           Wild-type GP                                                   X              Any protein                                                    x              The gene for protein X                                         BD             Binding Domain                                                 BPTI           Bovine pancreatic trypsin                                                     inhibitor, identical to                                                       aprotinin (Merck Index, entry                                                 784, p. 199) (SEQ ID NO: 44)                                   IPBD           Initial Potential Binding                                                     Domain, e.g. BPTI                                              PBD            Potential Binding Domain, e.g.                                                a derivative of BPTI                                           SBD            Successful Binding Domain, e.g.                                               a derivative of BPTI selected                                                 for binding to a target                                        PPBD           Parental Potential Binding                                                    Domain, i.e. an IPBD or an SBD                                                from a previous selection                                      OSP            Outer Surface Portein, e.g.                                                   coat protein of a phage or LamB                                               from E. coli                                                   OSP-PBD        Fusion of an OSP and PBD,                                                     order or fusion not specified                                  OSTS           Outer Surface Transport Signal                                 GP(x)          A genetic package containing                                                  the x gene                                                     GP(X)          A genetic package that displays                                               X on its outer surface                                         GP(osp-pbd)    GP containing an osp-pbd gene                                  GP(OSP-PBD)    A genetic package that displays                                               PBD on its outside as a fusion                                                to OSP                                                         GP(pbd)        GP containing a pbd gene, osp                                                 implicit                                                       GP(PBD)        A genetic package displaying                                                  PBD on its outisde, OSP                                                       unspecified                                                    {Q}            An affinity matrix supporting                                                 "Q", e.g. {T4 lysozyme} is T4                                                 lysozyme attached to an                                                       affinity matrix                                                AfM(W)         A molecule having affinity for                                                "W", e.g. trypsin is an                                                       AfM(BPTI)                                                      AfM(W)*        AfM(W) carrying a lbel, e.g.                                                  125.sub.I                                                      XINDUCE        A chemical that can induce                                                    expression of a gene, e.g. IPTG                                               for the lacUV5 promoter                                        OCV            Operative Cloning Vector                                       K.sub.d        A bimolecular dissociation                                                    constant, K.sub.2 = [A][B]/[A:B]                               K.sub.T        K.sub.T = [T][SBD]/[T:SBD] (T is a                                            target)                                                        K.sub.N        K.sub.N = [N][SBD]/[N:SBD] (N is a                                            non-target)                                                    DoAMoM         Density of AfM(W) on affinity                                                 matrix                                                         mfaa           Most-Favored amino acid                                        lfaa           Least-Favored amino acid                                       Abun(x)        Abundance of DNA molecules                                                    encoding amino acid x                                          OMP            Outer membrane protein                                         nt             nucleotide                                                     SP-I           Signal-sequence Peptidase I                                    Y.sub.DQ       Yield of ssDNA up to Q bases                                                  long                                                           M.sub.DNA      Maximum length of ssDNA that                                                  can be synthesized in accep-                                                  table yield                                                    Y.sub. pl      Yield of plasmid DNA per volume                                               of culture                                                     L.sub.eff      DNA ligation efficiency                                        M.sub.ntv      Maximum number of transformants                                               produced from Y.sub.D100 DNA of                                               Insert                                                         C.sub.eff      Efficiency of chromatographic                                                 enrichment, enrichment per pass                                C.sub.sensi    Sensitivity of chromatographic                                                separation, can find 1 in N,                                   N.sub.chrom    Maximum number of enrichment                                                  cycles per variegation cycle                                   S.sub.err      Error level in synthesizing                                                   vgDNA                                                          ::             in-frame genetic fusion or                                                    protein produced from in-frame                                                fused gene                                                     ______________________________________                                    

Single-letter codes for amino acids and nucleotides are given in Table1.

II. THE INITIAL POTENTIAL BINDING DOMAIN (IPBD): II.A. Generally

The initial potential binding domain may be: 1) a domain of a naturallyoccurring protein, 2) a non-naturally occurring domain whichsubstantially corresponds in sequence to a naturally occurring domain,but which differs from it in sequence by one or more substitutions,insertions or deletions, 3) a domain substantially corresponding insequence to a hybrid of subsequences of two or more naturally occurringproteins, or 4) an artificial domain designed entirely on theoreticalgrounds based on knowledge of amino acid geometries and statisticalevidence of secondary structure preferences of amino acids. (However,the limitations of priori protein design prompted the presentinvention.) Usually, the domain will be a known binding domain, or atleast a homologue thereof, but it may be derived from a protein which,while not possessing a known binding activity, possesses a secondary orhigher structure that lends itself to binding activity (clefts, grooves,etc.). The protein to which the IPBD is related need not have anyspecific affinity for the target material.

In determining whether sequences should be deemed to "substantiallycorrespond", one should consider the following issues: the degree ofsequence similarity when the sequences are aligned for best fitaccording to standard algorithms, the similarity in the connectivitypatterns of any crosslinks (e.g., disulfide bonds), the degree to whichthe proteins have similar three-dimensional structures, as indicated by,e.g., X-ray diffraction analysis or NMR, and the degree to which thesequenced proteins have similar biological activity. In this context, itshould be noted that among the serine protease inhibitors, there arefamilies of proteins recognized to be homologous in which there arepairs of members with as little as 30% sequence homology.

A candidate IPBD should meet the following criteria:

1) a domain exists that will remain stable under the conditions of itsintended use (the domain may comprise the entire protein that will beinserted, e.g. BPTI, (SEQ ID NO:144 α-conotoxin GI, or CMTI-III),

2) knowledge of the amino acid sequence is obtainable, and

3) a molecule is obtainable having specific and high affinity for theIPBD, AfM(IPBD).

Preferably, in order to guide the variegation strategy, knowledge of theidentity of the residues on the domain's outer surface, and theirspatial relationships, is obtainable; however, this consideration isless important if the binding domain is small, e.g., under 40 residues.

Preferably, the IPBD is no larger than necessary because small SBDs (forexample, less than 30 amino acids) can be chemically synthesized andbecause it is easier to arrange restriction sites in smaller amino-acidsequences. For PBDs smaller than about 40 residues, an added advantageis that the entire variegated pbd gene can be synthesized in one piece.In that case, we need arrange only suitable restriction sites in the ospgene. A smaller protein minimizes the metabolic strain on the GP or thehost of the GP. The IPBD is preferably smaller than about 200 residues.The IPBD must also be large enough to have acceptable binding affinityand specificity. For an IPBD lacking covalent crosslinks, such asdisulfide bonds, the IPBD is preferably at least 40 residues; it may beas small as six residues if it contains a crosslink. These small,crosslinked IPBDs, known as "mini-proteins", are discussed in moredetail later in this section.

Some candidate IPBDs, which meet the conditions set forth above, will bemore suitable than others. Information about candidate IPBDs that willbe used to judge the suitability of the IPBD includes: 1) a 3D structure(knowledge strongly preferred), 2) one or more sequences homologous tothe IPBD (the more homologous sequences known, the better), 3) the pI ofthe IPBD (knowledge desirable when target is highly charged), 4) thestability and solubility as a function of temperature, pH and ionicstrength (preferably known to be stable over a wide range and soluble inconditions of intended use), 5) ability to bind metal ions such as Ca⁺⁺or Mg⁺⁺ (knowledge preferred; binding per se, no preference), 6)enzymatic activities, if any (knowledge preferred, activity per se hasuses but may cause problems), 7) binding properties, if any (knowledgepreferred, specific binding also preferred), 8) availability of amolecule having specific and strong affinity (K_(d) <10⁻¹¹ M) for theIPBD (preferred), 9) availability of a molecule having specific andmedium affinity (10⁻⁸ M<K_(d) <10⁻⁶ M) for the IPBD (preferred), 10) thesequence of a mutant of IPBD that does not bind to the affinitymolecule(s) (preferred), and 11) absorption spectrum in visible, UV,NMR, etc. (characteristic absorption preferred).

If only one species of molecule having affinity for IPBD (AfM(IPBD)) isavailable, it will be used to: a) detect the IPBD on the GP surface, b)optimize expression level and density of the affinity molecule on thematrix, and c) determine the efficiency and sensitivity of the affinityseparation. As noted above, however, one would prefer to have availabletwo species of AfM(IPBD), one with high and one with moderate affinityfor the IPBD. The species with high affinity would be used in initialdetection and in determining efficiency and sensitivity, and the specieswith moderate affinity would be used in optimization.

If the IPBD is not itself a binding domain of a known binding protein,or if its native target has not been purified, an antibody raisedagainst the IPBD may be used as the affinity molecule. Use of anantibody for this purpose should not be taken to mean that the antibodyis the ultimate target.

There are many candidate IPBDs for which all of the above information isavailable or is reasonably practical to obtain, for example, bovinepancreatic trypsin inhibitor (BPTI, 58 residues), CMTI-III (29residues), crambin (46 residues), third domain of ovomucoid (56residues), heat-stable enterotoxin (ST-Ia of E. coli) (18 residues),α-Conotoxin GI (13 residues), μ-Conotoxin GIII (22 residues), Conus KingKong mini-protein (27 residues), T4 lysozyme (164 residues), and azurin(128 residues). Structural information can be obtained from X-ray orneutron diffraction studies, NMR, chemical cross linking or labeling,modeling from known structures of related proteins, or from theoreticalcalculations. 3D structural information obtained by X-ray diffraction,neutron diffraction or NMR is preferred because these methods allowlocalization of almost all of the atoms to within defined limits. Table50 lists several preferred IPBDs. Works related to determination of 3Dstructure of small proteins via NMR inculde: CHAZ85, PEAS90, PEAS88,CLOR86, CLOR87a, HEIT89, LECO87, WAGN79, and PARD89.

In some cases, a protein having some affinity for the target may be apreferred IPBD even though some other criteria are not optimally met.For example, the V1 domain of CD4 is a good choice as IPBD for a proteinthat binds to gp120 of HIV. It is known that mutations in the region 42to 55 of V1 greatly affect gp120 binding and that other mutations eitherhave much less effect or completely disrupt the structure of V1.Similarly, tumor necrosis factor (TNF) would be a good initial choice ifone wants a TNF-like molecule having higher affinity for the TNFreceptor.

Membrane-bound proteins are not preferred IPBPs, though they may serveas a source of outer surface transport signals. One should distinguishbetween membrane-bound proteins, such as LamB or OmpF, that cross themembrane several times forming a structure that is embedded in the lipidbilayer and in which the exposed regions are the loops that jointrans-membrane segments, from non-embedded proteins, such as the solubledomains of CD4, that are simply anchored to the membrane. This is animportant distinction because it is quite difficult to create a solublederivative of a membrane-bound protein. Soluble binding proteins are ingeneral more useful since purification is simpler and they are moretractable and more versatile assay reagents.

Most of the PBDs derived from a PPBD according to the process of thepresent invention will have been derived by variegation at residueshaving side groups directed toward the solvent. Reidhaar-Olson and Sauer(REID88a) found that exposed residues can accept a wide range of aminoacids, while buried residues are more limited in this regard. Surfacemutations typically have only small effects on melting temperature ofthe PBD, but may reduce the stability of the PBD. Hence the chosen IPBDshould have a high melting temperature (50° C. acceptable, the higherthe better; BPTI melts at 95° C.) and be stable over a wide pH range(8.0 to 3.0 acceptable; 11.0 to 2.0 preferred), so that the SBDs derivedfrom the chosen IPBD by mutation and selection-through-binding willretain sufficient stability. Preferably, the substitutions in the IPBDyielding the various PBDs do not reduce the melting point of the domainbelow ≈40° C. Mutations may arise that increase the stability of SBDsrelative to the IPBD, but the process of the present invention does notdepend upon this occurring. Proteins containing covalent crosslinks,such as multiple disulfides, are usually sufficient stable. A proteinhaving at least two disulfides and having at least 1 disulfide per everytwenty residues may be presumed to be sufficiently stable.

Two general characteristics of the target molecule, size and charge,make certain classes of IPBDs more likely than other classes to yieldderivatives that will bind specifically to the target. Because these arevery general characteristics, one can divide all targets into sixclasses: a) large positive, b) large neutral, c) large negative, d)small positive, e) small neutral, and f) small negative. A smallcollection of IPBDs, one or a few corresponding to each class of target,will contain a preferred candidate IPBD for any chosen target.

Alternatively, the user may elect to engineer a GP(IPBD) for aparticular target; criteria are given below that relate target size andcharge to the choice of IPBD.

II.B. Influence of target size on choice of IPBD:

If the target is a protein or other macromolecule a preferred embodimentof the IPBD is a small protein such as the Cucurbita maxima trypsininhibitor III (29 residues), BPTI from Bos Taurus (58 residues), crambinfrom rape seed (46 residues), or the third domain of ovomucoid fromCoturnix coturnix Japonica (Japanese quail) (56 residues), becausetargets from this class have clefts and grooves that can accommodatesmall proteins in highly specific ways. If the target is a macromoleculelacking a compact structure, such as starch, it should be treated as ifit were a small molecule. Extended macromolecules with defined 3Dstructure, such as collagen, should be treated as large molecules.

If the target is a small molecule, such as a steroid, a preferredembodiment of the IPBD is a protein of about 80-200 residues, such asribonuclease from Bos taurus (124 residues), ribonuclease fromAsperoillus oruzae (104 residues), hen egg white lysozyme from Gallusgallus (129 residues), azurin from Pseudomonas aerugenosa (128residues), or T4 lysozyme (164 residues), because such proteins haveclefts and grooves into which the small target molecules can fit. TheBrookhaven Protein Data Bank contains 3D structures for all of theproteins listed. Genes encoding proteins as large as T4 lysozyme can bemanipulated by standard techniques for the purposes of this invention.

If the target is a mineral, insoluble in water, one considers the natureof the molecular surface of the mineral. Minerals that have smoothsurfaces, such as crystalline silicon, are best addressed with medium tolarge proteins, such as ribonuclease, as IPBD in order to havesufficient contact area and specificity. Minerals with rough, groovedsurfaces, such as zeolites, could be bound either by small proteins,such as BPTI, or larger proteins, such as T4 lysozyme.

II.C. Influence of target charge on choice of IPBD:

Electrostatic repulsion between molecules of like charge can preventmolecules with highly complementary surfaces from binding. Therefore, itis preferred that, under the conditions of intended use, the IPBD andthe target molecule either have opposite charge or that one of them isneutral. In some cases it has been observed that protein molecules bindin such a way that like charged groups are juxtaposed by includingoppositely charged counter ions in the molecular interface. Thus,inclusion of counter ions can reduce or eliminate electrostaticrepulsion and the user may elect to include ions in the eluants used inthe affinity separation step. Polyvalent ions are more effective atreducing repulsion than monovalent ions.

II.D. Other considerations in the choice of IPBD:

If the chosen IPBD is an enzyme, it may be necessary to change one ormore residues in the active site to inactivate enzyme function. Forexample, if the IPBD were T4 lysozyme and the GP were E. coli cells orM13, we would need to inactivate the lysozyme because otherwise it wouldlyse the cells. If, on the other hand, the GP were ΦX174, theninactivation of lysozyme may not be needed because T4 lysozyme can beoverproduced inside E. coli cells without detrimental effects and ΦX174forms intracellularly. It is preferred to inactivate enzyme IPBDs thatmight be harmful to the GP or its host by substituting mutant aminoacids at one or more residues of the active site. It is permitted tovary one or more of the residues that were changed to abolish theoriginal enzymatic activity of the IPBD. Those GPs that receive osp-pbdgenes encoding an active enzyme may die, but the majority of sequenceswill not be deleterious.

If the binding protein is intended for therapeutic use in humans oranimals, the IPBD may be chosen from proteins native to the designatedrecipient to minimize the possibility of antigenic reactions.

II.E. Bovine Pancreatic Trypsin Inhibitor (BPTI) as an IPBD

BPTI is an especially preferred IPBD because it meets or exceeds all thecriteria: it is a small, very stable protein with a well known 3Dstructure. Marks et al. (MARK86) have shown that a fusion of the phoAsignal peptide gene fragment and DNA coding for the mature form of BPTIcaused native BPTI to appear in the periplasm of E. coli. demonstratingthat there is nothing in the structure of BPTI to prevent its beingsecreted.

The structure of BPTI is maintained even when one or another of thedisulfides is removed, either by chemical blocking or by geneticalteration of the amino-acid sequence. The stabilizing influence of thedisulfides in BPTI is not equally distributed. Goldenberg (GOLD85)reports that blocking CYS14 and CYS38 lowers the Tm of BPTI to ≈75° C.while chemical blocking of either of the other disulfides lowers Tm tobelow 40° C. Chemically blocking a disulfide may lower Tm more thanmutating the cysteines to other amino-acid types because the bulkyblocking groups are more destabilizing than removal of the disulfide.Marks et al. (MARK87) replaced both CYS14 and CYS38 with either twoalanines or two threonines. The CYS14/CYS38 cystine bridge that Marks etal. removed is the one very close to the scissile bond in BPTI;surprisingly, both mutant molecules functioned as trypsin inhibitors.Schnabel et al. (SCHN86) report preparation of aprotinin(C14A,C38A) byuse of Raney nickel. Eigenbrot et al. (EIGE90) report the X-raystructure of BPTI(C30 A/C51A) which is stable to at least 50° C. Thebackbone of this mutant is as similar to BPTI as are the backbones ofBPTI molecules that sit in different crystal lattices. This indicatesthat BPTI is redundantly stable and so is likely to fold intoapproximately the same structure despite numerous surface mutations.Using the knowledge of homologues, vide infra, we can infer whichresidues should not be varied if the basic BPTI structure is to bemaintained.

The 3D structure of BPTI has been determined at high resolution by X-raydiffraction (HUBE77, MARQ83, WLOD84, WLOD87a, WLOD87b), neutrondiffraction (WLOD84), and by NMR (WAGN87). In one of the X-raystructures deposited in the Brookhaven Protein Data Bank, entry 6PTI,there was no electron density for A58, indicating that A58 has nouniquely defined conformation. Thus we know that the carboxy group doesnot make any essential interaction in the folded structure. The aminoterminus of BPTI is very near to the carboxy terminus. Goldenberg andCreighton reported on circularized BPTI and circularly permuted BPTI(GOLD83). Some proteins homologous to BPTI have more or fewer residuesat either terminus.

BPTI has been called "the hydrogen atom of protein folding" and has beenthe subject of numerous experimental and theoretical studies (STAT87,SCHW87, GOLD83, CHAZ83, CREI74, CREI77a, CREI77b, CREI80, SIEK87,SINH90, RUEH73, HUBE74, HUBE75, HUBE77 and others).

BPTI has the added advantage that at least 59 homologous proteins areknown. Table 13 shows the sequences of 39 homologues. A tally ofionizable groups in 59 homologues is shown in Table 14 and the compositeof amino acid types occurring at each residue is shown in Table 15.

BPTI is freely soluble and is not known to bind metal ions. BPTI has noknown enzymatic activity. BPTI is not toxic.

All of the conserved residues are buried; of the six fully conservedresidues only G37 has noticeable exposure. The solvent accessibility ofeach residue in BPTI is given in Table 16 which was calculated from theentry "6PTI" in the Brookhaven Protein Data Bank with a solvent radiusof 1.4 Å, the atomic radii given in Table 7, and the method of Lee andRichards (LEEB71). Each of the 52 non-conserved residues can accommodatetwo or more kinds of amino acids. By independently substituting at eachresidue only those amino acids already observed at that residue, wecould obtain approximately 1.6.10⁴³ different amino acid sequences, mostof which will fold into structures very similar to BPTI.

BPTI will be especially useful as a IPBD for macromolecular targets.BPTI and BPTI homologues bind tightly and with high specificity to anumber of enzyme macromolecules.

BPTI is strongly positively charged except at very high pH, thus BPTI isuseful as IPBD for targets that are not also strongly positive under theconditions of intended use. There exist homologues of BPTI, however,having quite different charges (viz. SCI-III from Bombyx mori at -7 andthe trypsin inhibitor from bovine colostrum at -1). Once a geneticpackage is found that displays BPTI on its surface, the sequence of theBPTI domain can be replaced by one of the homologous sequences toproduce acidic or neutral IPBDs.

BPTI is quite small; if this should cause a pharmacological problem, twoor more BPTI-derived domains may be joined as in humans BPTI homologues,one of which has two domains (BALD85, ALBR83b) and another has three(WUNT88).

Another possible pharmacological problem is immunigenicity. BPTI hasbeen used in humans with very few adverse effects. Siekmann et al.(SIEK89) have studied immunological characteristics of BPTI and somehomologues. It is an advantage of the method of the present inventionthat a variety of SBDs can be obtained so that, if one derivative provesto be antigenic, a different SBD may be used. Furthermore, one canreduce the probability of immune response by starting with a humanprotein, such as LACI (a BPTI homologue) (WUNT88, GIRA89) orInter-α-Trypsin Inhibitor (ALBR83a, ALBR83b, DIAR90, ENGH89, TRIB86,GEBH86, GEBH90, KAUM86, ODOM90, SALI90).

Further, a BPTI-derived gene fragment, coding for a novel bindingdomain, could be fused in-frame to a gene fragment coding for otherproteins, such as serum albumin or the constant parts of IgG.

Tschesche et al. (TSCH87) reported on the binding of several BPTIderivatives to various proteases:

Dissociation constants for BPTI derivatives, Molar.

    ______________________________________                                                Trypsin   Chymotrypsin                                                                             Elastase                                                                             Elastase                                  Residue (bovine   (bovine    (porcine                                                                             (human                                    #15     pancreas) pancreas)  pancreas)                                                                            leukocytes)                               ______________________________________                                        lysine  6.0 · 10.sup.-14                                                               9.0 · 10.sup.-9                                                                 -      3.5 · 10.sup.-6                  glycine -         -          +      7.0 · 10.sup.-9                  alanine +         -          2.8 · 10.sup.-8                                                             2.5 · 10.sup.-9                  valine  -         -          5.7 · 10.sup.-8                                                             1.1 · 10.sup.-10                 leucine -         -          1.9 · 10.sup.-8                                                             2.9 · 10.sup.-9                  ______________________________________                                    

From the report of Tschesche et al. we infer that molecular pairs marked"+" have K_(d) s ≧3.5·10⁻⁶ M and that molecular pairs marked "-" haveK_(d) s >>3.5·10⁻⁶ M. Because of the wealth of data about the binding ofBPTI and various mutants to trypsin and other proteases (TSCH87), we canproceed in various ways in optimizing the affinity separationconditions. (For other PBDs, we can obtain two different monoclonalantibodies, one with a high affinity having K_(d) of order 10⁻¹¹ M, andone with a moderate affinity having K_(d) on the order of 10⁻⁶ M.)

Works concerning BPTI and its homologues include: KIDO88, PONT88,KIDO90, AUER87, AUER90, SCOT87b, AUER88, AUER89, BECK88b, WACH79,WACH80, BECK89a, DUFT85, FIOR88, GIRA89, GOLD84, GOLD88, HOCH84, RITO83,NORR89a, NORR89b, OLTE89, SWAI88, and WAGN79.

II.F Mini-Proteins as IPBDs

A polypeptide is a polymer composed of a single chain of the same ordifferent amino acids joined by peptide bonds. Linear peptides can takeup a very large number of different conformations through internalrotations about the main chain single bonds of each α carbon. Theserotations are hindered to varying degrees by side groups, with glycineinterfering the least, and valine, isoleucine and, especially, proline,the most. A polypeptide of 20 residues may have 10²⁰ differentconformations which it may assume by various internal rotations.

Proteins are polypeptides which, as a result of stabilizing interactionsbetween amino acids that are not in adjacent positions in the chain,have folded into a well-defined conformation. This folding is usuallyessential to their biological activity.

For polypeptides of 40-60 residues or longer, noncovalent forces such ashydrogen bonds, salt bridges, and hydrophobic "interactions" aresufficient to stabilize a particular folding or conformation. Thepolypeptide's constituent segments are held to more or less thatconformation unless it is perturbed by a denaturant such as risingtemperature or decreasing pH, whereupon the polypeptide unfolds or"melts". The smaller the peptide, the more likely it is that itsconformation will be determined by the environment. If a smallunconstrained peptide has biological activity, the peptide ligand willbe in essence a random coil until it comes into proximity with itsreceptor. The receptor accepts the peptide only in one or a fewconformations because alternative conformations are disfavored byunfavorable van der Waals and other non-covalent interactions.

Small polypeptides have potential advantages over larger polypeptideswhen used as therapeutic or diagnostic agents, including (but notlimited to):

a) better penetration into tissues,

b) faster elimination from the circulation (important for imagingagents),

c) lower antigenicity, and

d) higher activity per mass.

Moreover, polypeptides of under about 50 residues have the advantage ofaccessibility via chemical synthesis; polypeptides of under about 30residues are more easily synthesized than are larger polypeptides. Thus,it would be desirable to be able to employ the combination ofvariegation and affinity selection to identify small polypeptides whichbind a target of choice.

Polypeptides of this size, however, have disadvantages as bindingmolecules. According to Olivera et al. (OLIV90a): "Peptides in this sizerange normally equilibrate among many conformations (in order to have afixed conformation, proteins generally have to be much larger)."Specific binding of a peptide to a target molecule requires the peptideto take up one conformation that is complementary to the binding site.For a decapeptide with three isoenergetic conformations (e.g., β strand,α helix, and reverse turn) at each residue, there are about 6.·10⁴possible overall conformations. Assuming these conformations to beequi-probable for the unconstrained decapeptide, if only one of thepossible conformations bound to the binding site, then the affinity ofthe peptide for the target is expected to be about 6·10⁴ higher if itcould be constrained to that single effective conformation. Thus, theunconstrained decapeptide, relative to a decapeptide constrained to thecorrect conformation, would be expected to exhibit lower affinity. Itwould also exhibit lower specificity, since one of the otherconformations of the unconstrained decapeptide might be one which boundtightly to a material other than the intended target. By way ofcorollary, it could have less resistance to degradation by proteases,since it would be more likely to provide a binding site for theprotease.

In one embodiment, the present invention overcomes these problems, whileretaining the advantages of smaller polypeptides, by fostering thebiosynthesis of novel mini-proteins having the desired bindingcharacteristics. Mini-Proteins are small polypeptides (usually less thanabout 60 residues) which, while too small to have a stable conformationas a result of noncovalent forces alone, are covalently crosslinked(e.g., by disulfide bonds) into a stable conformation and hence havebiological activities more typical of larger protein molecules than ofunconstrained polypeptides of comparable size.

When mini-proteins are variegated, the residues which are covalentlycrosslinked in the parental molecule are left unchanged, therebystabilizing the conformation. For example, in the variegation of adisulfide bonded mini-protein, certain cysteines are invariant so thatunder the conditions of expression and display, covalent crosslinks(e.g., disulfide bonds between one or more pairs of cysteines) form, andsubstantially constrain the conformation which may be adopted by thehypervariable linearly intermediate amino acids. In other words, aconstraining scaffolding is engineered into polypeptides which areotherwise extensively randomized.

Once a mini-protein of desired binding characteristics is characterized,it may be produced, not only by recombinant DNA techniques, but also bynonbiological synthetic methods.

In vitro, disulfide bridges can form spontaneously in polypeptides as aresult of air oxidation. Matters are more complicated in vivo. Very fewintracellular proteins have disulfide bridges, probably because a strongreducing environment is maintained by the glutathione system. Disulfidebridges are common in proteins that travel or operate in extracellularspaces, such as snake venoms and other toxins (e.g., conotoxins,charybdotoxin, bacterial enterotoxins), peptide hormones, digestiveenzymes, complement proteins, immunoglobulins, lysozymes, proteaseinhibitors (BPTI and its homologues, CMTI-III (Cucurbita maxima trypsininhibitor III) and its homologues, hirudin, etc.) and milk proteins.

Disulfide bonds that close tight intrachain loops have been found inpepsin, thioredoxin, insulin A-chain, silk fibroin, and lipoamidedehydrogenase. The bridged cysteine residues are separated by one tofour residues along the polypeptide chain. Model building, X-raydiffraction analysis, and NMR studies have shown that the α carbon pathof such loops is usually flat and rigid.

There are two types of disulfide bridges in immunoglobulins. One is theconserved intrachain bridge, spanning about 60 to 70 amino acid residuesand found, repeatedly, in almost every immunoglobulin domain. Burieddeep between the opposing β sheets, these bridges are shielded fromsolvent and ordinarily can be reduced only in the presence of denaturingagents. The remaining disulfide bridges are mainly interchain bonds andare located on the surface of the molecule; they are accessible tosolvent and relatively easily reduced (STEI85). The disulfide bridges ofthe mini-proteins of the present invention are intrachain linkagesbetween cysteines having much smaller chain spacings.

For the purpose of the appended claims, a mini-protein has between abouteight and about sixty residues. However, it will be understood that achimeric surface protein presenting a mini-protein as a domain willnormally have more than sixty residues. Polypeptides containingintrachain disulfide bonds may be characterized as cyclic in nature,since a closed circle of covalently bonded atoms is defined by the twocysteines, the intermediate amino acid residues, their peptidyl bonds,and the disulfide bond. The terms "cycle", "span" and "segment" will beused to define certain structural features of the polypeptides. Anintrachain disulfide bridge connecting amino acids 3 and 8 of a 16residue polypeptide will be said herein to have a cycle of 6 and a spanof 4. If amino acids 4 and 12 are also disulfide bonded, then they forma second cycle of 9 with a span of 7. Together, the four cysteinesdivide the polypeptide into four intercysteine segments (1-2, 5-7, 9-11,and 13-16). (Note that there is no segment between Cys3 and Cys4.)

The connectivity pattern of a crosslinked mini-protein is a simpledescription of the relative location of the termini of the crosslinks.For example, for a mini-protein with two disulfide bonds, theconnectivity pattern "1-3, 2-4" means that the first crosslinkedcysteine is disulfide bonded to the third crosslinked cysteine (in theprimary sequence), and the second to the fourth.

The degree to which the crosslink constrains the conformational freedomof the mini-protein, and the degree to which it stabilizes themini-protein, may be assessed by a number of means. These includeabsorption spectroscopy (which can reveal whether an amino acid isburied or exposed), circular dichroism studies (which provides a generalpicture of the helical content of the protein), nuclear magneticresonance imaging (which reveals the number of nuclei in a particularchemical environment as well as the mobility of nuclei), and X-ray orneutron diffraction analysis of protein crystals. The stability of themini-protein may be ascertained by monitoring the changes in absorptionat various wavelengths as a function of temperature, pH, etc.; buriedresidues become exposed as the protein unfolds. Similarly, the unfoldingof the mini-protein as a result of denaturing conditions results inchanges in NMR line positions and widths. Circular dichroism (CD)spectra are extremely sensitive to conformation.

The variegated disulfide-bonded mini-proteins of the present inventionfall into several classes.

Class I mini-proteins are those featuring a single pair of cysteinescapable of interacting to form a disulfide bond, said bond having a spanof no more than nine residues. This disulfide bridge preferably has aspan of at least two residues; this is a function of the geometry of thedisulfide bond. When the spacing is two or three residues, one residueis preferably glycine in order to reduce the strain on the bridgedresidues. The upper limit on spacing is less precise, however, ingeneral, the greater the spacing, the less the constraint onconformation imposed on the linearly intermediate amino acid residues bythe disulfide bond.

The main chain of such a peptide has very little freedom, but is notstressed. The free energy released when the disulfide forms exceeds thefree energy lost by the main-chain when locked into a conformation thatbrings the cysteines together. Having lost the free energy of disulfideformation, the proximal ends of the side groups are held in more or lessfixed relation to each other. When binding to a target, the domain doesnot need to expend free energy getting into the correct conformation.The domain can not jump into some other conformation and bind anon-target.

A disulfide bridge with a span of 4 or 5 is especially preferred. If thespan is increased to 6, the constraining influence is reduced. In thiscase, we prefer that at least one of the enclosed residues be an aminoacid that imposes restrictions on the main-chain geometry. Prolineimposes the most restriction. Valine and isoleucine restrict the mainchain to a lesser extent. The preferred position for this constrainingnon-cysteine residue is adjacent to one of the invariant cysteines,however, it may be one of the other bridged residues. If the span isseven, we prefer to include two amino acids that limit main-chainconformation. These amino acids could be at any of the seven positions,but are preferably the two bridged residues that are immediatelyadjacent to the cysteines. If the span is eight or nine, additionalconstraining amino acids may be provided.

The disulfide bond of a class I mini-proteins is exposed to solvent.Thus, one should avoid exposing the variegated population of GPs thatdisplay class I mini-proteins to reagents that rupture disulfides;Creighton names several such reagents (CREI88).

Class II mini-proteins are those featuring a single disulfide bondhaving a span of greater than nine amino acids. The bridged amino acidsform secondary structures which help to stabilize their conformation.Preferably, these intermediate amino acids form hairpin supersecondarystructures such as those schematized below: ##STR1## Secondarystructures are stabilized by hydrogen bonds between amide nitrogen andcarbonyl groups, by interactions between charged side groups and helixdipoles, and by van der Waals contacts. One abundant secondary structurein proteins is the α-helix. The α helix has 3.6 residues per turn, a 1.5Å rise per residue, and a helical radius of 2.3 Å. All observedα-helices are right-handed. The torsion angles φ (-57°) and Ψ (-47°) arefavorable for most residues, and the hydrogen bond between the backbonecarbonyl oxygen of each residue and the backbone NH of the fourthresidue along the chain is 2.86 Å long (nearly the optimal distance) andvirtually straight. Since the hydrogen bonds all point in the samedirection, the α helix has a considerable dipole moment (carboxyterminus negative).

The β strand may be considered an elongated helix with 2.3 residues perturn, a translation of 3.3 Å per residue, and a helical radius of 1.0 Å.Alone, a β strand forms no main-chain hydrogen bonds. Most commonly, βstrands are found in twisted (rather than planar) parallel,antiparallel, or mixed parallel/antiparallel sheets.

A peptide chain can form a sharp reverse turn. A reverse turn may beaccomplished with as few as four amino acids. Reverse turns are veryabundant, comprising a quarter of all residues in globular proteins. Inproteins, reverse turns commonly connect β strands to form β sheets, butmay also form other connections. A peptide can also form other turnsthat are less sharp.

Based on studies of known proteins, one may calculate the propensity ofa particular residue, or of a particular dipeptide or tripeptide, to befound in an α helix, β strand or reverse turn. The normalizedfrequencies of occurrence of the amino acid residues in these secondarystructures is given in Table 6-4 of CREI84. For a more detailedtreatment on the prediction of secondary structure from the amino acidsequence, see Chapter 6 of SCHU79.

In designing a suitable hairpin structure, one may copy an actualstructure from a protein whose three-dimensional conformation is known,design the structure using frequency data, or combine the twoapproaches. Preferably, one or more actual structures are used as amodel, and the frequency data is used to determine which mutations canbe made without disrupting the structure.

Preferably, no more than three amino acids lie between the cysteine andthe beginning or end of the α helix or β strand.

More complex structures (such as a double hairpin) are also possible.

Class III mini-proteins are those featuring a plurality of disulfidebonds. They optionally may also feature secondary structures such asthose discussed above with regard to Class II mini-proteins. Since thenumber of possible disulfide bond topologies increases rapidly with thenumber of bonds (two bonds, three topologies; three bonds, 15topologies; four bonds, 105 topologies) the number of disulfide bondspreferably does not exceed four. With two or more disulfide bonds, thedisulfide bridge spans preferably do not exceed 50, and the largestintercysteine chain segment preferably does not exceed 20.

Naturally occurring class III mini-proteins, such as heat-stableenterotoxin ST-Ia frequently have pairs of cysteines that are adjacentin the amino-acid sequence. Adjacent cysteines are very unlikely to forman intramolecular disulfide and cysteines separated by a single aminoacids form an intramolecular disulfide with difficulty and only forcertain intervening amino acids. Thus, clustering cysteines within theamino-acid sequence reduces the number of realizable disulfide bondingschemes. We utilize such clustering in the class III mini-proteindisclosed herein.

Metal Finger Mini-Proteins. The mini-proteins of the present inventionare not limited to those crosslinked by disulfide bonds. Anotherimportant class of mini-proteins are analogues of finger proteins.Finger proteins are characterized by finger structures in which a metalion is coordinated by two Cys and two His residues, forming atetrahedral arrangement around it. The metal ion is most often zinc(II),but may be iron, copper, cobalt, etc. The "finger" has the consensussequence (Phe or Tyr)-(1 AA)-Cys-(2-4 AAs)-Cys-(3 AAs)-Phe-(5AAs)-Leu-(2 AAs)-His-(3 AAs)-His-(5 AAs) SEQ ID NO: 1,2,3,4,5,6)(BERG88; GIBS88). While finger proteins typically contain many repeatsof the finger motif, it is known that a single finger will fold in thepresence of zinc ions (FRAN87; PARR88). There is some dispute as towhether two fingers are necessary for binding to DNA. The presentinvention encompasses mini-proteins with either one or two fingers. Itis to be understood that the target need not be a nucleic acid.

G. Modified PBSs

There exist a number of enzymes and chemical reagents that canselectively modify certain side groups of proteins, including: a)protein-tyrosine kinase, Ellmans reagent, methyl transferases (thatmethylate GLU side groups), serine kinases, proline hydroxyases,vitamin-K dependent enzymes that convert GLU to GLA, maleic anhydride,and alkylating agents. Treatment of the variegated population ofGP(PBD)s with one of these enzymes or reagents will modify the sidegroups affected by the chosen enzyme or reagent. Enzymes and reagentsthat do not kill the GP are much preferred. Such modification of sidegroups can directly affect the binding properties of the displayed PBDs.Using affinity separation methods, we enrich for the modified GPs thatbind the predetermined target. Since the active binding domain is notentirely genetically specified, we must repeat the post-morphogenesismodification at each enrichment round. This approach is particularlyappropriate with mini-protein IPBDs because we envision chemicalsynthesis of these SBDs.

III. VARIEGATION STRATEGY--MUTAGENESIS TO OBTAIN POTENTIAL BINDINGDOMAINS WITH DESIRED DIVERSITY III.A. Generally

Using standard genetic engineering techniques, a molecule of variegatedDNA can be introduced into a vector so that it constitutes part of agene (OLIP86, OLIP87, AUSU87, REID88a). When vector containingvariegated DNA are used to transform bacteria, each cell makes a versionof the original protein. Each colony of bacteria may produce a differentversion from any other colony. If the variegations of the DNA areconcentrated at loci known to be on the surface of the protein or in aloop, a population of proteins will be generated, many members of whichwill fold into roughly the same 3D structure as the parent protein. Thespecific binding properties of each member, however, may be differentfrom each other member.

We now consider the manner in which we generate a diverse population ofpotential binding domains in order to facilitate selection of aPBD-bearing GP which binds with the requisite affinity to the target ofchoice. The potential binding domains are first designed at the aminoacid level. Once we have identified which residues are to bemutagenized, and which mutations to allow at those positions, we maythen design the variegated DNA which is to encode the various PBDs so asto assure that there is a reasonable probability that if a PBD has anaffinity for the target, it will be detected. Of course, the number ofindependent transformants obtained and the sensitivity of the affinityseparation technology will impose limits on the extent of variegationpossible within any single round of variegation.

There are many ways to generate diversity in a protein. (See RICH86,CARU85, and OLIP86.) At one extreme, we vary a few residues of theprotein as much as possible (inter alia see CARU85, CARU87, RICH86, andWHAR86). We will call this approach "Focused Mutagenesis". A typical"Focused Mutagenesis" strategy is to pick a set of five to sevenresidues and vary each through 13-20 possibilities. An alternative planof mutagenesis ("Diffuse Mutagenesis") is to vary many more residuesthrough a more limited set of choices (See VERS86a and PAKU86). Thevariegation pattern adopted may fall between these extremes, e.g., tworesidues varied through all twenty amino acids, two more through onlytwo possibilities, and a fifth into ten of the twenty amino acids.

There is no fixed limit on the number of codons which can be mutatedsimultaneously. However, it is desirable to adopt a mutagenesis strategywhich results in a reasonable probability that a possible PBD sequenceis in fact displayed by at least one genetic package. When the size ofthe set of amino acids potentially encoded by each variable codon is thesame for all variable codons and within the set all amino acids areequiprobable, this probability may be calculated as follows: Let Γ(k,q)be the probability that amino acid number k will occur at variegatedcodon q; these codons need not be contiguous. The probability that aparticular vgDNA molecule will encode a PBD containing D variegatedamino acids k₁, . . . , k_(n) is:

    p(k.sub.1, . . . , k.sub.n)=Γ(k.sub.1,1)·. . . ·Γ(k.sub.n,n)

Consider a library of N_(it) independent transformants prepared withsaid vgDNA; the probability that the sequence k₁, . . . , k_(n) isabsent is:

    P(missing k.sub.1, . . . , k.sub.n)=exp{-N.sub.it ·p(k.sub.1, . . . , k.sub.n)}.

    P(k.sub.1, . . . , k.sub.n in lib)=1-exp{-N.sub.it ·p(k.sub.1, . . . , k.sub.n)}.

Preferably, the probability that a mutein encoded by the vgDNA andcomposed of the least favored amino acids at each variegated positionwill be displayed by at least one independent transformant in thelibrary is at least 0.50, and more preferably at least 0.90. (Muteinscomposed of more favored amino acids would of course be more likely tooccur in the same library.)

Preferably, the variegation is such as will cause a typical transformantpopulation to display 10⁶ -10⁷ different amino acid sequences by meansof preferably not more than 10-fold more (more preferably not more than3-fold) different DNA sequences.

For a mini-protein that lacks α helices and β strands, one will, in anygiven round of mutation, preferably variegate each of 4-6 non-cysteinecodons so that they each encode at least eight of the 20 possible aminoacids. The variegation at each codon could be customized to thatposition. Preferably, cysteine is not one of the potentialsubstitutions, though it is not excluded.

When the mini-protein is a metal finger protein, in a typicalvariegation strategy, the two Cys and two His residues, and optionallyalso the aforementioned Phe/Tyr, Phe and Leu residues, are heldinvariant and a plurality (usually 5-10) of the other residues arevaried.

When the mini-protein is of the type featuring one or more α helices andβ strands, the set of potential amino acid modifications at any givenposition is picked to favor those which are less likely to disrupt thesecondary structure at that position. Since the number of possibilitiesat each variable amino acid is more limited, the total number ofvariable amino acids may be greater without altering the samplingefficiency of the selection process.

For the last-mentioned class of mini-proteins, as well as domains otherthan mini-proteins, preferably not more than 20 and more preferably 5-10codons will be variegated. However, if diffuse mutagenesis is employed,the number of codons which are variegated can be higher.

The decision as to which residues to modify is eased by knowledge ofwhich residues lie on the surface of the domain and which are buried inthe interior.

We choose residues in the IPBD to vary through consideration of severalfactors, including: a) the 3D structure of the IPBD, b) sequenceshomologous to IPBD, and c) modeling of the IPBD and mutants of the IPBD.When the number of residues that could strongly influence binding isgreater than the number that should be varied simultaneously, the usershould pick a subset of those residues to vary at one time. The userpicks trial levels of variegation and calculate the abundances ofvarious sequences. The list of varied residues and the level ofvariegation at each varied residue are adjusted until the compositevariegation is commensurate with the sensitivity of the affinityseparation and the number of independent transformants that can be made.

Preferably, the abundance of PPBD-encoding DNA is 3 to 10 times higherthan both 1/M_(ntv) and 1/C_(sensi) to provide a margin of redundancy.M_(ntv) is the number of transformants that can be made from Y_(D100)DNA. With current technology Mntv is approximately 5·10⁸, but the exactvalue depends on the details of the procedures adapted by the user.Improvements in technology that allow more efficient: a) synthesis ofDNA, b) ligation of DNA, or c) transformation of cells will raise thevalue of M_(ntv). C_(sensi) is the sensitivity of the affinityseparation; improvements in affinity separation will raise C_(sensi). Ifthe smaller of M_(ntv) and Csensi is increased, higher levels ofvariegation may be used. For example, if C_(sensi) is 1 in 10⁹ andM_(ntv) is 10⁸, then improvements in C_(sensi) are less valuable thanimprovements in M_(ntv).

While variegation normally will involve the substitution of one aminoacid for another at a designated variable codon, it may involve theinsertion or deletion of amino acids as well.

III.B. Identification of Residues to be Varied

We now consider the principles that guide our choice of residues of theIPBD to vary. A key concept is that only structured proteins exhibitspecific binding, i.e. can bind to a particular chemical entity to theexclusion of most others. Thus the residues to be varied are chosen withan eye to preserving the underlying IPBD structure. Substitutions thatprevent the PBD from folding will cause GPs carrying those genes to bindindiscriminately so that they can easily be removed from the population.

Sauer and colleagues (PAKU86, REID88) a, and Caruthers and colleagues(EISE85) have shown that some residues on the polypeptide chain are moreimportant than others in determining the 3D structure of a protein. The3D structure is essentially unaffected by the identity of the aminoacids at some loci; at other loci only one or a few types of amino acidis allowed. In most cases, loci where wide variety is allowed have theamino acid side group directed toward the solvent. Loci where limitedvariety is allowed frequently have the side group directed toward otherparts of the protein. Thus substitutions of amino acids that are exposedto solvent are less likely to affect the 3D structure than aresubstitutions at internal loci. (See also SCHU79, p169-171 and CREI84,p239-245, 314-315).

The residues that join helices to helices, helices to sheets, and sheetsto sheets are called turns and loops and have been classified byRichardson (RICH81), Thornton (THOR88), Sutcliffe et al. (SUTC87a) andothers. Insertions and deletions are more readily tolerated in loopsthan elsewhere. Thornton et al. (THOR88) have summarized manyobservations indicating that related proteins usually differ most at theloops which join the more regular elements of secondary structure.(These observations are relevant not only to the variegation ofpotential binding domains but also to the insertion of binding domainsinto an outer surface protein of a genetic package, as discussed in alater section.)

Burial of hydrophobic surfaces so that bulk water is excluded is one ofthe strongest forces driving the binding of proteins to other molecules.Bulk water can be excluded from the region between two molecules only ifthe surfaces are complementary. We should test as many surfacevariations as possible to find one that is complementary to the target.The selection-through-binding isolates those proteins that are morenearly complementary to some surface on the target.

Proteins do not have distinct, countable faces. Therefore we define an"interaction set" to be a set of residues such that all members of theset can simultaneously touch one molecule of the target material withoutany atom of the target coming closer than van der Waals distance to anymain-chain atom of the IPBD. The concept of a residue "touching" amolecule of the target is discussed below. From a picture of BPTI (suchas FIG. 6-10, p. 225 of CREI84) we can see that residues 3, 7, 8, 10,13, 39, 41, and 42 can all simultaneously contact a molecule the sizeand shape of myoglobin. We also see that residue 49 can not touch asingle myoglobin molecule simultaneously with any of the first set eventhough all are on the surface of BPTI. (It is not the intent of thepresent invention, however, to suggest that use of models is required todetermine which part of the target molecule will actually be the site ofbinding by PBD.)

Variations in the position, orientation and nature of the side chains ofthe residues of the interaction set will alter the shape of thepotential binding surface defined by that set. Any individualcombination of such variations may result in a surface shape which is abetter or a worse fit for the target surface. The effective diversity ofa variegated population is measured by the number of distinct shapes thepotentially complementary surfaces of the PBD can adopt, rather than thenumber of protein sequences. Thus, it is preferable to maximize theformer number, when our knowledge of the IPBD permits us to do so.

To maximize the number of surface shapes generated for when N residuesare varied, all residues varied in a given round of variegation shouldbe in the same interaction set because variation of several residues inone interaction set generates an exponential number of different shapesof the potential binding surface.

If cassette mutagenesis is to be used to introduce the variegated DNAinto the ipbd gene, the protein residues to be varied are, preferably,close enough together in sequence that the variegated DNA (vgDNA)encoding all of them can be made in one piece. The present invention isnot limited to a particular length of vgDNA that can be synthesized.With current technology, a stretch of 60 amino acids (180 DNA bases) canbe spanned.

Further, when there is reason to mutate residues further than sixtyresidues apart, one can use other mutational means, such assingle-stranded-oligonucleotide-directed mutagenesis (BOTS85) using twoor more mutating primers.

Alternatively, to vary residues separated by more than sixty residues,two cassettes may be mutated as follows: 1) vg DNA having a low level ofvariegation (for example, 20 to 400 fold variegation) is introduced intoone cassette in the OCV, 2) cells are transformed and cultured, 3) vgOCV DNA is obtained, 4) a second segment of vgDNA is inserted into asecond cassette in the OCV, and 5) cells are transformed and cultured,GPs are harvested and subjected to selection-through-binding.

The composite level of variation preferably does not exceed theprevailing capabilities to a) produce very large numbers ofindependently transformed cells or b) detect small components in ahighly varied population. The limits on the level of variegation arediscussed later.

Data about the IPBD and the target that are useful in deciding whichresidues to vary in the variegation cycle include: 1) 3D structure, orat least a list of residues on the surface of the IPBD, 2) list ofsequences homologous to IPBD, and 3) model of the target molecule or astand-in for the target.

These data and an understanding of the behavior of different amino acidsin proteins will be used to answer two questions:

1) which residues of the IPBD are on the outside and close enoughtogether in space to touch the target simultaneously?

2) which residues of the IPBD can be varied with high probability ofretaining the underlying IPBD structure?

Although an atomic model of the target material (obtained through X-raycrystallography, NMR, or other means) is preferred in such examination,it is not necessary. For example, if the target were a protein ofunknown 3D structure, it would be sufficient to know the molecularweight of the protein and whether it were a soluble globular protein, afibrous protein, or a membrane protein. Physical measurements, such aslow-angle neutron diffraction, can determine the overall molecularshape, viz. the ratios of the principal moments of inertia. One can thenchoose a protein of known structure of the same class and similar sizeand shape to use as a molecular stand-in and yardstick. It is notessential to measure the moments of inertia of the target because, atlow resolution, all proteins of a given size and class look much thesame. The specific volumes are the same, all are more or less sphericaland therefore all proteins of the same size and class have about thesame radius of curvature. The radii of curvature of the two moleculesdetermine how much of the two molecules can come into contact.

The most appropriate method of picking the residues of the protein chainat which the amino acids should be varied is by viewing, withinteractive computer graphics, a model of the IPBD. A stick-figurerepresentation of molecules is preferred. A suitable set of hardware isan Evans & Sutherland PS390 graphics terminal (Evans & SutherlandCorporation, Salt Lake City, Utah) and a MicroVAX II supermicro computer(Digital Equipment Corp., Maynard, Mass.). The computer should,preferably, have at least 150 megabytes of disk storage, so that theBrookhaven Protein Data Bank can be kept on line. A FORTRAN compiler, orsome equally good higher-level language processor is preferred forprogram development. Suitable programs for viewing and manipulatingprotein models include: a) PS-FRODO, written by T. A. Jones (JONE85) anddistributed by the Biochemistry Department of Rice University, Houston,Tex.; and b) PROTEUS, developed by Dayringer, Tramantano, and Fletterick(DAYR86). Important features of PS-FRODO and PROTEUS that are needed toview and manipulate protein models for the purposes of the presentinvention are the abilities to: 1) display molecular stick figures ofproteins and other molecules, 2) zoom and clip images in real time, 3)prepare various abstract representations of the molecules, such as aline joining C.sub.α s and side group atoms, 4) compute and displaysolvent-accessible surfaces reasonably quickly, 5) point to and identifyatoms, and 6) measure distance between atoms.

In addition, one could use theoretical calculations, such as dynamicsimulations of proteins, to estimate whether a substitution at aparticular residue of a particular amino-acid type might produce aprotein of approximately the same 3D structure as the parent protein.Such calculations might also indicate whether a particular substitutionwill greatly affect the flexibility of the protein; calculations of thissort may be useful but are not required.

Residues whose mutagenesis is most likely to affect binding to a targetmolecule, without destabilizing the protein, are called the "principalset". Using the knowledge of which residues are on the surface of theIPBD (as noted above), we pick residues that are close enough togetheron the surface of the IPBD to touch a molecule of the targetsimultaneously without having any IPBD main-chain atom come closer thanvan der Waals distance (viz. 4.0 to 5.0 Å) from any target atom. For thepurposes of the present invention, a residue of the IPBD "touches" thetarget if: a) a main-chain atom is within van der Waals distance, viz.4.0 to 5.0 Å of any atom of the target molecule, or b) the C.sub.β iswithin D_(cutoff) of any atom of the target molecule so that aside-group atom could make contact with that atom.

Because side groups differ in size (cf. Table 35), some judgment isrequired in picking D_(cutoff). In the preferred embodiment, we will useD_(cutoff) =8.0 Å, but other values in the range 6.0 Å to 10.0 Å couldbe used. If IPBD has G at a residue, we construct a pseudo C.sub.β withthe correct bond distance and angles and judge the ability of theresidue to touch the target from this pseudo C.sub.β.

Alternatively, we choose a set of residues on the surface of the IPBDsuch that the curvature of the surface defined by the residues in theset is not so great that it would prevent contact between all residuesin the set and a molecule of the target. This method is appropriate ifthe target is a macromolecule, such as a protein, because the PBDsderived from the IPBD will contact only a part of the macromolecularsurface. The surfaces of macromolecules are irregular with varyingcurvatures. If we pick residues that define a surface that is not tooconvex, then there will be a region on a macromolecular target with acompatible curvature.

In addition to the geometrical criteria, we prefer that there be someindication that the underlying IPBD structure will toleratesubstitutions at each residue in the principal set of residues.Indications could come from various sources, including: a) homologoussequences, b) static computer modeling, or c) dynamic computersimulations.

The residues in the principal set need not be contiguous in the proteinsequence and usually are not. The exposed surfaces of the residues to bevaried do not need to be connected. We desire only that the amino acidsin the residues to be varied all be capable of touching a molecule ofthe target material simultaneously without having atoms overlap. If thetarget were, for example, horse heart myoglobin, and if the IPBD wereBPTI, any set of residues in one interaction set of BPTI defined inTable 34 could be picked.

The secondary set comprises those residues not in the primary set thattouch residues in the primary set. These residues might be excluded fromthe primary set because: a) the residue is internal, b) the residue ishighly conserved, or c) the residue is on the surface, but the curvatureof the IPBD surface prevents the residue from being in contact with thetarget at the same time as one or more residues in the primary set.

Internal residues are frequently conserved and the amino acid type cannot be changed to a significantly different type without substantialrisk that the protein structure will be disrupted. Nevertheless, someconservative changes of internal residues, such as I to L or F to Y, aretolerated. Such conservative changes subtly affect the placement anddynamics of adjacent protein residues and such "fine tuning" may beuseful once an SBD is found.

Surface residues in the secondary set are most often located on theperiphery of the principal set. Such peripheral residues can not makedirect contact with the target simultaneously with all the otherresidues of the principal set. The charge on the amino acid in one ofthese residues could, however, have a strong effect on binding. Once anSBD is found, it is appropriate to vary the charge of some or all ofthese residues. For example, the variegated codon containing equimolar Aand G at base 1, equimolar C and A at base 2, and A at base 3 yieldsamino acids T, A, K, and E with equal probability.

The assignment of residues to the primary and secondary sets may bebased on: a) geometry of the IPBD and the geometrical relationshipbetween the IPBD and the target (or a stand-in for the target) in ahypothetical complex, and b) sequences of proteins homologous to theIPBD. However, it should be noted that the distinction between theprincipal set and the secondary set is one more of convenience than ofsubstance; we could just as easily have assigned each amino acid residuein the domain a preference score that weighed together the differentconsiderations affecting whether they are suitable for variegation, andthen ranked the residues in order, from most preferred to least.

For any given round of variegation, it may be necessary to limit thevariegation to a subset of the residues in the primary and secondarysets, based on geometry and on the maximum allowed level of variegationthat assures progressivity. The allowed level of variegation determineshow many residues can be varied at once; geometry determines which ones.

The user may pick residues to vary in many ways. For example, pairs ofresidues are picked that are diametrically opposed across the face ofthe principal set. Two such pairs are used to delimit the surface,up/down and right/left. Alternatively, three residues that form aninscribed triangle, having as large an area as possible, on the surfaceare picked. One to three other residues are picked in a checkerboardfashion across the interaction surface. Choice of widely spaced residuesto vary creates the possibility for high specificity because all theintervening residues must have acceptable complementarity beforefavorable interactions can occur at widely-separated residues.

The number of residues picked is coupled to the range through which eachcan be varied by the restrictions discussed below. In the first round,we do not assume any binding between IPBD and the target and soprogressivity is not an issue. At the first round, the user may elect toproduce a level of variegation such that each molecule of vgDNA ispotentially different through, for example, unlimited variegation of 10codons (20¹⁰ approx. =10¹³). One run of the DNA synthesizer producesapproximately 10¹³ molecules of length 100 nts. Inefficiencies inligation and transformation will reduce the number of proteins actuallytested to between 10⁷ and 5.10⁸. Multiple replications of the processwith such very high levels of variegation will not yield repeatableresults; the user decides whether this is important.

III.C. Determining the Substitution Set for Each Parental Residue

Having picked which residues to vary, we now decide the range of aminoacids to allow at each variable residue. The total level of variegationis the product of the number of variants at each varied residue. Eachvaried residue can have a different scheme of variegation, producing 2to 20 different possibilities. The set of amino acids which arepotentially encoded by a given variegated codon are called its"substitution set".

The computer that controls a DNA synthesizer, such as the Milligen 7500,can be programmed to synthesize any base of an oligo-nt with anydistribution of nts by taking some nt substrates (e.g. ntphosphoramidites) from each of two or more reservoirs. Alternatively, ntsubstrates can be mixed in any ratios and placed in one of the extrareservoir for so called "dirty bottle" synthesis. Each codon could beprogrammed differently. The "mix" of bases at each nucleotide positionof the codon determines the relative frequency of occurrence of thedifferent amino acids encoded by that codon.

Simply variegated codons are those in which those nucleotide positionswhich are degenerate are obtained from a mixture of two or more basesmixed in equimolar proportions. These mixtures are described in thisspecification by means of the standardized "ambiguous nucleotide" code(Table 1 and 37 CFR §1.822). In this code, for example, in thedegenerate codon "SNT", "S" denotes an equimolar mixture of bases G andC, "N", an equimolar mixture of all four bases, and "T", the singleinvariant base thymidine.

Complexly variegated codons are those in which at least one of the threepositions is filled by a base from an other than equimolar mixture oftwo of more bases.

Either simply or complexly variegated codons may be used to achieve thedesired substitution set.

If we have no information indicating that a particular amino acid orclass of amino acid is appropriate, we strive to substitute all aminoacids with equal probability because representation of one mini-proteinabove the detectable level is wasteful. Equal amounts of all four nts ateach position in a codon (NNN) yields the amino acid distribution inwhich each amino acid is present in proportion to the number of codonsthat code for it. This distribution has the disadvantage of giving twobasic residues for every acidic residue. In addition, six times as muchR, S, and L as W or M occur. If five codons are synthesized with thisdistribution, each of the 243 sequences encoding some combination of L,R, and S are 7776-times more abundant than each of the 32 sequencesencoding some combination of W and M. To have five Ws present atdetectable levels, we must have each of the (L,R,S) sequences present in7776-fold excess.

Preferably, we also consider the interactions between the sites ofvariegation and the surrounding DNA. If the method of mutagenesis to beused is replacement of a cassette, we consider whether the variegationwill generate gratuitous restriction sites and whether they seriouslyinterfere with the intended introduction of diversity. We reduce oreliminate gratuitous restriction sites by appropriate choice ofvariegation pattern and silent alteration of codons neighboring thesites of variegation.

It is generally accepted that the sequence of amino acids in a proteinor polypeptide determine the three-dimensional structure of themolecule, including the possibility of no definite structure. Amongpolypeptides of definite length and sequence, some have a definedtertiary structure and most do not.

Particular amino acid residues can influence the tertiary structure of adefined polypeptide in several ways, including by:

a) affecting the flexibility of the polypeptide main chain,

b) adding hydrophobic groups,

c) adding charged groups,

d) allowing hydrogen bonds, and

e) forming cross-links, such as disulfides, chelation to metal ions, orbonding to prosthetic groups.

Most works on proteins classify the twenty amino acids into categoriessuch as hydrophobic/hydrophilic, positive/negative/neutral, orlarge/small. These classifications are useful rules of thumb, but onemust be careful not to oversimplify. Proteins contain a variety ofidentifiable secondary structural features, including: a) α helices, b)3-10 helices, c) anti-parallel sheets, d) parallel β sheets, e) Ω loops,f) reverse turns, and g) various cross links. Many people have analyzedproteins of known structures and assigned each amino-acid to onecategory or another. Using the frequency at which particular amino acidsoccur in various types of secondary structures, people have a) tried topredict the secondary structures of proteins for which only theamino-acid sequence is known (CHOU74, CHOU78a, CHOU78b), and b) designedproteins de novo that have a particular set of secondary structuralelements (DEGR87, HECH90). Although some amino acids show definitepredilection for one secondary form (e.g. VAL for β structure and ALAfor α helices), these preferences are not very strong; Creighton hastabulated the preferences (CREI84). In only seven cases does thetendency exceed 2.0:

    ______________________________________                                        Amino acid      distinction                                                                             ratio                                               ______________________________________                                        MET             α/turn                                                                            3.7                                                 PRO             turn/α                                                                            3.7                                                 VAL             β/turn                                                                             3.2                                                 GLY             turn/α                                                                            2.9                                                 ILE             β/turn                                                                             2.8                                                 PHE             β/turn                                                                             2.3                                                 LEU             α/turn                                                                            2.2                                                 ______________________________________                                    

Every amino-acid type has been observed in every identified secondarystructural motif. ARG is particularly indiscriminate.

PRO is generally taken to be a helix breaker. Nevertheless, prolineoften occurs at the beginning of helices or even in the middle of ahelix, where it introduces a slight bend in the helix. Matthews andcoworkers replaced a PRO that occurs near the middle of an α helix in T4lysozyme. To their surprise, the "improved" protein is less stable thanthe wild-type. The rest of the structure had been adapted to fit thebent helix.

Lundeen (LUND86) has tabulated the frequencies of amino acids inhelices, β strands, turns, and coil in proteins of known 3 D structureand has distinguished between CYSs having free thiol groups and halfcystines. He reports that free CYS is found most often in helixes whilehalf cystines are found more often in β sheets. Half cystines are,however, regularly found in helices. Pease et al. (PEAS90) constructed apeptide having two cystines; one end of each is in a very stable αhelix. Apamin has a similar structure (WEMM83, PEAS88).

Flexibility

GLY is the smallest amino acid, having two hydrogens attached to theC.sub.α. Because GLY has no C.sub.β, it confers the most flexibility onthe main chain. Thus GLY occurs very frequently in reverse turns,particularly in conjunction with PRO, ASP, ASN, SER, and THR.

The amino acids ALA, SER, CYS, ASP, ASN, LEU, MET, PHE, TYR, TRP, ARG,HIS, GLU, GLN, and LYS have unbranched β carbons. Of these, the sidegroups of SER, ASP, and ASN frequently make hydrogen bonds to the mainchain and so can take on main-chain conformations that are energeticallyunfavorable for the others. VAL, ILE, and THR have branched β carbonswhich makes the extended main-chain conformation more favorable. ThusVAL and ILE are most often seen in β sheets. Because the side group ofTHR can easily form hydrogen bonds to the main chain, it has lesstendency to exist in a β sheet.

The main chain of proline is particularly constrained by the cyclic sidegroup. The φ angle is always close to -60°. Most prolines are found nearthe surface of the protein.

Charge

LYS and ARG carry a single positive charge at any pH below 10.4 or 12.0,respectively. Nevertheless, the methylene groups, four and threerespectively, of these amino acids are capable of hydrophobicinteractions. The guanidinium group of ARG is capable of donating fivehydrogens simultaneously, while the amino group of LYS can donate onlythree. Furthermore, the geometries of these groups is quite different,so that these groups are often not interchangeable.

ASP and GLU carry a single negative charge at any pH above ≈4.5 and 4.6,respectively. Because ASP has but one methylene group, few hydrophobicinteractions are possible. The geometry of ASP lends itself to forminghydrogen bonds to main-chain nitrogens which is consistent with ASPbeing found very often in reverse turns and at the beginning of helices.GLU is more often found in α helices and particularly in theamino-terminal portion of these helices because the negative charge ofthe side group has a stabilizing interaction with the helix dipole(NICH88, SALI88).

HIS has an ionization pK in the physiological range, viz. 6.2. This pKcan be altered by the proximity of charged groups or of hydrogendonators or acceptors. HIS is capable of forming bonds to metal ionssuch as zinc, copper, and iron.

Hydrogen Bonds

Aside from the charged amino acids, SER, THR, ASN, GLN, TYR, and TRP canparticipate in hydrogen bonds.

Cross Links

The most important form of cross link is the disulfide bond formedbetween two thiols, especially the thiols of CYS residues. In a suitablyoxidizing environment, these bonds form spontaneously. These bonds cangreatly stabilize a particular conformation of a protein ormini-protein. When a mixture of oxidized and reduced thiol reagents arepresent, exchange reactions take place that allow the most stableconformation to predominate. Concerning disulfides in proteins andpeptides, see also KATZ90, MATS89, PERR84, PERR86, SAUE86, WELL86,JANA89, HORV89, KISH85, and SCHN86.

Other cross links that form without need of specific enzymes include:

    ______________________________________                                        1) (CYS).sub.4 :Fe                                                                            Rubredoxin (in CREI84, P. 376)                                2) (CYS).sub.4 :Zn                                                                            Aspartate Transcarbamylase (in                                                CREI84, P. 376) and Zn-fingers                                                (HARD90)                                                      3) (HIS).sub.2 (MET)(CYS):Cu                                                                  Azurin (in CREI84, P. 376) and                                                Basic "Blue" Cu Cucumber                                                      protein (GUSS88)                                              4) (HIS).sub.4 :Cu                                                                            CuZn superoxide dismutase                                     5) (CYS).sub.4 :(Fe.sub.4 S.sub.4)                                                            Ferredoxin (in CREI84, P. 376)                                6) (CYS).sub.2 (HIS).sub.2 :Zn                                                                Zinc-fingers (GIBS88)                                         7) (CYS).sub.3 (HIS):Zn                                                                       Zinc-fingers (GAUS87, GIBS88)                                 ______________________________________                                    

Cross links having (HIS)₂ (MET)(CYS):Cu has the potential advantage thatHIS and MET can not form other cross links without Cu.

Simply Variegated Codons

The following simply variegated codons are useful because they encode arelatively balanced set of amino acids:

1) SNT which encodes the set [L,P,H,R,V,A,D,G]: a) one acidic (D) andone basic (R), b) both aliphatic (L,V) and aromatic hydrophobics (H), c)large (L,R,H) and small (G,A) side groups, d) ridgid (P) and flexible(G) amino acids, e) each amino acid encoded once.

2) RNG which encodes the set [M,T,K,R,V,A,E,G]: a) one acidic and twobasic (not optimal, but acceptable), b) hydrophilics and hydrophobics,c) each amino acid encoded once.

3) RMG which encodes the set [T,K,A,E]: a) one acidic, one basic, oneneutral hydrophilic, b) three favor α helices, c) each amino acidencoded once.

4) VNT which encodes the set [L,P,H,R,I,T,N,S,V,A,D,G]: a) one acidic,one basic, b) all classes: charged, neutral hydrophilic, hydrophobic,ridgid and flexible, etc., c) each amino acid encoded once.

5) RRS which encodes the set [N,S,K,R,D,E,G² ]: a) two acidics, twobasics, b) two neutral hydrophilics, c) only glycine encoded twice.

6) NNT which encodes the set [F,S,Y,C,L,P,H,R,I,T,N,V,A,D,G]: a) sixteenDNA sequences provide fifteen different amino acids; only serine isrepeated, all others are present in equal amounts (This allows veryefficient sampling of the library.), b) there are equal numbers ofacidic and basic amino acids (D and R, once each), c) all major classesof amino acids are present: acidic, basic, aliphatic hydrophobic,aromatic hydrophobic, and neutral hydrophilic.

7) NNG, which encodes the set [L²,R²,S,W,P,Q,M,T,K,V,A,E,G, stop]: a)fair preponderance of residues that favor formation of α-helices[L,M,A,Q,K,E; and, to a lesser extent, S,R,T]; b) encodes 13 differentamino acids. (VHG encodes a subset of the set encoded by NNG whichencodes 9 amino acids in nine different DNA sequences, with equal acidsand bases, and 5/9 being α helix-favoring.)

For the initial variegation, NNT is preferred, in most cases. However,when the codon is encoding an amino acid to be incorporated into an αhelix, NNG is preferred.

Below, we analyze several simple variegations as to the efficiency withwhich the libraries can be sampled.

Libraries of random hexapeptides encoded by (NNK)⁶ have been reported(SCOT90, CWIR90). Table 130 shows the expected behavior of suchlibraries. NNK produces single codons for PHE, TYR, CYS, TRP, HIS, GLN,ILE, MET, ASN, LYS, ASP, and GLU (α set); two codons for each of VAL,ALA, PRO, THR, and GLY (Φ set); and three codons for each of LEU, ARG,and SER (Ω set). We have separated the 64,000,000 possible sequencesinto 28 classes, shown in Table 130A, based on the number of amino acidsfrom each of these sets. The largest class is ΦΩαααα with ≈14.6% of thepossible sequences. Aside from any selection, all the sequences in oneclass have the same probability of being produced. Table 130B shows theprobability that a given DNA sequence taken from the (NNK)⁶ library willencode a hexapeptide belonging to one of the defined classes; note thatonly ≈6.3% of DNA sequences belong to the class.

Table 130C shows the expected numbers of sequences in each class forlibraries containing various numbers of independent transformants (viz.10⁶, 3·10⁶, 10⁷, 3·10⁷, 10⁸, 3·10⁸, 10⁹, and 3·10⁹). At 10⁶ independenttransformants (ITs), we expect to see 56% of the ΩΩΩΩΩΩ class, but only0.1% of the αααααα class. The vast majority of sequences seen come fromclasses for which less than 10% of the class is sampled. Suppose apeptide from, for example, class ΦΦΩΩαα is isolated by fractionating thelibrary for binding to a target. Consider how much we know aboutpeptides that are related to the isolated sequence. Because only 4% ofthe ΦΦΩΩαα class was sampled, we can not conclude that the amino acidsfrom the Ω set are in fact the best from the Ω set. We might have LEU atposition 2, but ARG or SER could be better. Even if we isolate a peptideof the ΩΩΩΩΩΩ class, there is a noticeable chance that better members ofthe class were not present in the library.

With a library of 10⁷ ITs, we see that several classes have beencompletely sampled, but that the αααααα class is only 1.1% sampled. At7.6·10⁷ ITs, we expect display of 50% of all amino-acid sequences, butthe classes containing three or more amino acids of the α set are stillpoorly sampled. To achieve complete sampling of the (NNK)⁶ libraryrequires about 3·10⁹ ITs, 10-fold larger than the largest (NNK)⁶ libraryso far reported.

Table 131 shows expectations for a library encoded by (NNT)⁴ (NNG)². Theexpectations of abundance are independent of the order of the codons orof interspersed unvaried codons. This library encodes 0.133 times asmany amino-acid sequences, but there are only 0.0165 times as many DNAsequences. Thus 5.0·10⁷ ITs (i.e. 60-fold fewer than required for(NNK)⁶) gives almost complete sampling of the library. The results wouldbe slightly better for (NNT)⁶ and slightly, but not much, worse for(NNG)⁶. The controlling factor is the ratio of DNA sequences toamino-acid sequences.

Table 132 shows the ratio of #DNA sequences/#AA sequences for codonsNNK, NNT, and NNG. For NNK and NNG, we have assumed that the PBD isdisplayed as part of an essential gene, such as gene III in Ff phage, asis indicated by the phrase "assuming stops vanish". It is not in any wayrequired that such an essential gene be used. If a non-essential gene isused, the analysis would be slightly different; sampling of NNK and NNGwould be slightly less efficient. Note that (NNT)⁶ gives 3.6-fold moreamino-acid sequences than (NNK)⁵ but requires 1.7-fold fewer DNAsequences. Note also that (NNT)⁷ gives twice as many amino-acidsequences as (NNK)⁶, but 3.3-fold fewer DNA sequences.

Thus, while it is possible to use a simple mixture (NNS, NNK or NNN) toobtain at a particular position all twenty amino acids, these simplemixtures lead to a highly biased set of encoded amino acids. Thisproblem can be overcome by use of complexly variegated codons.

Complexly Variegated Codons

Let Abun(x) be the abundance of DNA sequences coding for amino acid x,defined by the distribution of nts at each base of the codon. For anydistribution, there will be a most-favored amino acid (mfaa) withabundance Abun(mfaa) and a least-favored amino acid (lfaa) withabundance Abun(lfaa). We seek the nt distribution that allows all twentyamino acids and that yields the largest ratio Abun(lfaa)/Abun(mfaa)subject, if desirable to further constraints.

We first will present the mixture calculated to be optimal when the ntdistribution is subject to two constraints: equal abundances of acidicand basic amino acids and the least possible number of stop codons. Thusonly nt distributions that yield Abun(E)+Abun(D)=Abun(R)+Abun(K) areconsidered, and the function maximized is:

    {(1-Abun(stop)) (Abun(lfaa)/Abun(mfaa))}.

We have simplified the search for an optimal nt distribution by limitingthe third base to T or G (C or G is equivalent). All amino acids arepossible and the number of accessible stop codons is reduced because TGAand TAA codons are eliminated. The amino acids F, Y, C, H, N, I, and Drequire T at the third base while W, M, Q, K, and E require G. Thus weuse an equimolar mixture of T and G at the third base. However, itshould be noted that the present invention embraces use of complexlyvariegated codons in which the third base is not limited to T or G (orto C or G).

A computer program, written as part of the present invention and named"Find Optimum vgCodon" (See Table 9), varies the composition at bases 1and 2, in steps of 0.05, and reports the composition that gives thelargest value of the quantity ((Abun(1faa)/Abun(mfaa) (1-Abun(stop)))}.A vg codon is symbolically defined by the nucleotide distribution ateach base:

    ______________________________________                                                    T      C        A       G                                         ______________________________________                                        base #1 =     t1       c1       a1    g1                                      base #2 =     t2       c2       a2    g2                                      base #3 =     t3       c3       a3    g3                                                  t1 + c1 + a1 + g1 = 1.0                                                       t2 + c2 + a2 + g2 = 1.0                                                       t3 = g3 = 0.5, c3 = a3 = 0.                                       ______________________________________                                    

The variation of the quantities t1, c1, a1, g1, t2, c2, a2, and g2 issubject to the constraint that:

    Abun(E)+Abun(D)=Abun(K)+Abun(R)

    Abun(E)+Abun(D)=g1*a2

    Abun(K)+Abun(R)=a1*a2/2+c1*g2+a1*g2/2

    g1*a2=a1*a2/2+c1*g2+a1*g2/2

Solving for g2, we obtain

    g2=(g1*a2-0.5*a1*a2)/(c1+0.5*a1).

In addition,

    t1=1-a1-c1-g1

    t2=1-a2-c2-g2.

We vary a1, c1, g1, a2, and c2 and then calculate t1, g2, and t2.Initially, variation is in steps of 5%. Once an approximately optimumdistribution of nucleotides is determined, the region is furtherexplored with steps of 1%. The logic of this program is shown in Table9. The optimum distribution (the "fxS" codon) is shown in Table 10A andyields DNA molecules encoding each type amino acid with the abundancesshown.

Note that this chemistry encodes all twenty amino acids, with acidic andbasic amino acids being equi-probable, and the most favored amino acid(serine) is encoded only 2.454 times as often as the least favored aminoacid (tryptophan). The "fxS" vg codon improves sampling most forpeptides containing several of the amino acids [F,Y,C,W,H,Q,I,M,N,K,D,E]for which NNK or NNS provide only one codon. Its sampling advantages aremost pronounced when the library is relatively small.

A modification of "Find Optimum vgCodon" varies the composition at bases1 and 2, in steps of 0.01, and reports the composition that gives thelargest value of the quantity {(Abun(lfaa)/Abun(mfaa))} without anyrestraint on the relative abundance of any amino acids. The results ofthis optimization is shown in Table 10B. The changes are small,indicating that insisting on equality of acids and bases and minimizingstop codons costs us little. Also note that, without restraining theoptimization, the prevalence of acidic and basic amino acids comes outfairly close. On the other hand, relaxing the restriction leaves adistribution in which the least favored amino acid is only 0.412 timesas prevalent as SER.

The advantages of an NNT codon are discussed elsewhere in the presentapplication. Unoptimized NNT provides 15 amino acids encoded by only 16DNA sequences. It is possible to improve on NNT as follows. First notethat the SER codons occur in the T and A rows of the genetic-code tableand in the C and G columns.

    [SER]=T.sub.1 ×C.sub.2 +A.sub.1 ×G.sub.2

If we reduce the prevalence of SER by reducing T₁, C₂, A₁, and G₂relative to other bases, then we will also reduce the prevalence of PHE,TYR, CYS, PRO, THR, ALA, ARG, GLY, ILE, and ASN. The prevalence of LEU,HIS, VAL, and ASP will rise. If we assume that T₁, C₂, A₁, and G₂ areall lowered to the same extent and that C₁, G₁, T₂, and A₂ are increasedby the same amount, we can compute a shift that makes the prevalence ofSER equal the prevalences of LEU, HIS, VAL, and ASP. The decrease ineach of PHE, TYR, CYS, PRO, THR, ALA, ARG, GLY, ILE, and ASN are notequal; CYS and THR are reduced more than the others.

Let the distribution be

    ______________________________________                                                    T        C        A      G                                        ______________________________________                                        base #1 =    .25-q    .25+q    .25-q  .25+q                                   base #2 =    .25+q    .25-q    .25+q  .25-q                                   base #3 =   1.00     0.0      0.0    0.0                                      Setting [SER] = [LEU] = [HIS] = [VAL] = [ASP] gives:                          (.25-q) · (.25-q) + (.25-q) · (.25-q) = (.25+q)             · (.25+q)                                                            2 · (.25-q).sup.2 = (.25+q).sup.2                                    q.sup.2  -1.5 q + .0625 = 0                                                    ##STR2##                                                                     ______________________________________                                    

This distribution (shown in Table 10C) gives five amino acids (SER, LEU,HIS, VAL, ASP) in equal amounts. A further eight amino acids (PHE, TYR,ILE, ASN, PRO, ALA, ARG, GLY) are present at 78% the abundance of SER.THR and CYS remain at half the abundance of SER. When variegating DNAfor disulfide-bonded mini-proteins, it is often desirable to reduce theprevalence of CYS. This distribution allows 13 amino acids to be seen athigh level and gives no stops; the optimized fxS distribution allowsonly 11 amino acids at high prevalence.

The NNG codon can also be optimized. Table 10D shows an approximatelyoptimized NNG codon. When equimolar T,C,A,G are used in NNG, one obtainsdouble doses of LEU and ARG. To improve the distribution, we increase G₁by 4δ, decrease T₁ and A₁ by δ each and C₁ by 2δ. We adopt this patternbecause C₁ affects both LEU and ARG while T₁ and A₁ each affect eitherLEU or ARG, but not both. Similarly, we decrease T₂ and G₂ by τ while weincrease C₂ and A₂ by τ. We adjusted δ and τ until [ALA]≈[ARG]. Thereare, under this variegation, four equally most favored amino acids: LEU,ARG, ALA, and GLU. Note that there is one acidic and one basic aminoacid in this set. There are two equally least favored amino acids: TRPand MET. The ratio of 1faa/mfaa is 0.5258. If this codon is repeated sixtimes, peptides composed entirely of TRP and MET are 2% as common aspeptides composed entirely of the most favored amino acids. We refer tothis as "the prevalence of (TRP/MET)⁶ in optimized NNG⁶ vgDNA".

When synthesizing vgDNA by the "dirty bottle" method, it is sometimesdesirable to use only a limited number of mixes. One very useful mixtureis called the "optimized NNS mixture" in which we average the first twopositions of the fxS mixture: T₁ =0.24, C₁ =0.17, A₁ =0.33, G₁ =0.26,the second position is identical to the first, C₃ =G₃ =0.5. Thisdistribution provides the amino acids ARG, SER, LEU, GLY, VAL, THR, ASN,and LYS at greater than 5% plus ALA, ASP, GLU, ILE, MET, and TYR atgreater than 4%.

An additional complexly variegated codon is of interest. This codon isidentical to the optimized NNT codon at the first two positions and hasT:G::90:10 at the third position. This codon provides thirteen aminoacids (ALA, ILE, ARG, SER, ASP, LEU, VAL, PHE, ASN, GLY, PRO, TYR, andHIS) at more than 5.5%. THR at 4.3% and CYS at 3.9% are more common thanthe LFAAs of NNK (3.125%). The remaining five amino acids are present atless than 1%. This codon has the feature that all amino acids arepresent; sequences having more than two of the low-abundance amino acidsare rare. When we isolate an SBD using this codon, we can be reasonablysure that the first 13 amino acids were tested at each position. Asimilar codon, based on optimized NNG, could be used.

Table 10E shows some properties of an unoptimized NNS (or NNK) codon.Note that there are three equally most-favored amino acids: ARG, LEU,and SER. There are also twelve equally least favored amino acids: PHE,ILE, MET, TYR, HIS, GLN, ASN, LYS, ASP, GLU, CYS, and TRP. Five aminoacids (PRO, THR, ALA, VAL, GLY) fall in between. Note that a six-foldrepetition of NNS gives sequences composed of the amino acids [PHE, ILE,MET, TYR, HIS, GLN, ASN, LYS, ASP, GLU, CYS, and TRP] at only ≈0.1% ofthe sequences composed of [ARG, LEU, and SER]. Not only is this ≈20-foldlower than the prevalence of (TRP/MET)⁶ in optimized NNG⁶ vgDNA, butthis low prevalence applies to twelve amino acids.

Diffuse Mutagenesis

Diffuse Mutagenesis can be applied to any part of the protein at anytime, but is most appropriate when some binding to the target has beenestablished. Diffuse Mutagenesis can be accomplished by spiking each ofthe pure nts activated for DNA synthesis (e.g. nt-phosphoramidites) witha small amount of one or more of the other activated nts.

Contrary to general practice, the present invention sets the level ofspiking so that only a small percentage (1% to 0.00001%, for example) ofthe final product will contain the initial DNA sequence. This willinsure that many single, double, triple, and higher mutations occur, butthat recovery of the basic sequence will be a possible outcome. LetN_(b) be the number of bases to be varied, and let Q be the fraction ofall sequences that should have the parental sequence, then M, thefraction of the mixture that is the majority component, is

    M=exp{log.sub.e (Q)/N.sub.b }=10 .sup.(log 10.sup.(Q)/N b.sup.).

If, for example, thirty base pairs on the DNA chain were to be variedand 1% of the product is to have the parental sequence, then each mixednt substrate should contain 86% of the parental nt and 14% of other nts.Table 8 shows the fraction (fn) of DNA molecules having n non-parentalbases when 30 bases are synthesized with reagents that contain fractionM of the majority component. When M=0.63096, f24 and higher are lessthan 10⁻⁸. The entry "most" in Table 8 is the number of changes that hasthe highest probability. Note that substantial probability for multiplesubstitutions only occurs if the fraction of parental sequence (f0) isallowed to drop to around 10⁻⁶. The N_(b) base pairs of the DNA chainthat are synthesized with mixed reagents need not be contiguous. Theyare picked so that between N_(b) /3 and N_(b) codons are affected tovarious degrees. The residues picked for mutation are picked withreference to the 3 D structure of the IPBD, if known. For example, onemight pick all or most of the residues in the principal and secondaryset. We may impose restrictions on the extent of variation at each ofthese residues based on homologous sequences or other data. The mixtureof non-parental nts need not be random, rather mixtures can be biased togive particular amino acid types specific probabilities of appearance ateach codon. For example, one residue may contain a hydrophobic aminoacid in all known homologous sequences; in such a case, the first andthird base of that codon would be varied, but the second would be set toT. Other examples of how this might be done are given in the horse heartmyoglobin example. This diffuse structure-directed mutagenesis willreveal the subtle changes possible in protein backbone associated withconservative interior changes, such as V to I, as well as some not sosubtle changes that require concomitant changes at two or more residuesof the protein.

III.D. Special Considerations Relating to Variegation of Mini-Proteinswith Essential Cysteines

Several of the preferred simple or complex variegated codons encode aset of amino acids which includes cysteine. This means that some of theencoded binding domains will feature one or more cysteines in additionto the invariant disulfide-bonded cysteines. For example, at eachNNT-encoded position, there is a one in sixteen chance of obtainingcysteine. If six codons are so varied, the fraction of domainscontaining additional cysteines is 0.33. Odd numbers of cysteines canlead to complications, see Perry and Wetzel (PERR84) PERR86. On theother hand, many disulfide-containing proteins contain cysteines that donot form disulfides, e.g. trypsin. The possibility of unpaired cysteinescan be dealt with in several ways:

First, the variegated phage population can be passed over an immobilizedreagent that strongly binds free thiols, such as SulfoLink (cataloguenumber 44895 H from Pierce Chemical Company, Rockford, Ill., 61105).Another product from Pierce is TNB-Thiol Agarose (Catalogue Code 20409H). BioRad sells Affi-Gel 401 (catalogue 153-4599) for this purpose.

Second, one can use a variegation that excludes cysteines, such as:

    ______________________________________                                        NHT that gives [F,S,Y,L,P,H,I,T,N,V,A,D],                                     VNS that gives [L.sup.2,P.sup.2,H,Q,R.sup.3,I,M,T.sup.2,N,K,S,V.sup.2,A.su    p.2, E,D,G.sup.2 ],                                                           NNG that gives [L.sup.2,S,W,P,Q,R.sup.2,M,T,K,R,V,A,E,G,stop],                SNT that gives [L,P,H.R.V.A.D.G],                                             RNG that gives [M,T,K,R,V,A,E,G],                                             RMG that gives [T,K,A,E],                                                     VNT that gives [L,P,H,R,I,T,N,S,V,A,D,G], or                                  RRS that gives [N,S,K,R,D,E,G.sup.2 ].                                        ______________________________________                                    

However, each of these schemes has one or more of the disadvantages,relative to NNT: a) fewer amino acids are allowed, b) amino acids arenot evenly provided, c) acidic and basic amino acids are not equallylikely), or d) stop codons occur. Nonetheless, NNG, NHT, and VNT arealmost as useful as NNT. NNG encodes 13 different amino acids and onestop signal. Only two amino acids appear twice in the 16-fold mix.

Thirdly, one can enrich the population for binding to the preselectedtarget, and evaluate selected sequences post hoc for extra cysteines.Those that contain more cysteines than the cysteines provided forconformational constraint may be perfectly usable. It is possible that adisulfide linkage other than the designed one will occur. This does notmean that the binding domain defined by the isolated DNA sequence is inany way unsuitable. The suitability of the isolated domains is bestdetermined by chemical and biochemical evaluation of chemicallysynthesized peptides.

Lastly, one can block free thiols with reagents, such as Ellman'sreagent, iodoacetate, or methyl iodide, that specifically bind freethiols and that do not react with disulfides, and then leave themodified phage in the population. It is to be understood that theblocking agent may alter the binding properties of the mini-protein;thus, one might use a variety of blocking reagent in expectation thatdifferent binding domains will be found. The variegated population ofthiol-blocked genetic packages are fractionated for binding. If the DNAsequence of the isolated binding mini-protein contains an odd number ofcysteines, then synthetic means are used to prepare mini-proteins havingeach possible linkage and in which the odd thiol is appropriatelyblocked. Nishiuchi (NISH82, NISH86, and works cited therein) disclosemethods of synthesizing peptides that contain a plurality of cysteinesso that each thiol is protected with a different type of blocking group.These groups can be selectively removed so that the disulfide pairingcan be controlled. We envision using such a scheme with the alterationthat one thiol either remains blocked, or is unblocked and thenreblocked with a different reagent.

III.E. Planning the Second and Later Rounds of Variegation

The method of the present invention allows efficient accumulation ofinformation concerning the amino-acid sequence of a binding domainhaving high affinity for a predetermined target. Although one may obtaina highly useful binding domain from a single round of variegation andaffinity enrichment, we expect that multiple rounds will be needed toachieve the highest possible affinity and specificity.

If the first round of variegation results in some binding to the target,but the affinity for the target is still too low, further improvementmay be achieved by variegation of the SBDs. Preferably, the process isprogressive, i.e. each variegation cycle produces a better startingpoint for the next variegation cycle than the previous cycle produced.Setting the level of variegation such that the ppbd and many sequencesrelated to the ppbd sequence are present in detectable amounts ensuresthat the process is progressive. If the level of variegation is so highthat the ppbd sequence is present at such low levels that there is anappreciable chance that no transformant will display the PPBD, then thebest SBD of the next round could be worse than the PPBD. At excessivelyhigh level of variegation, each round of mutagenesis is independent ofprevious rounds and there is no assurance of progressivity. Thisapproach can lead to valuable binding proteins, but repetition ofexperiments with this level of variegation will not yield progressiveresults. Excessive variation is not preferred.

Progressivity is not an all-or-nothing property. So long as most of theinformation obtained from previous variegation cycles is retained andmany different surfaces that are related to the PPBD surface areproduced, the process is progressive. If the level of variegation is sohigh that the ppbd gene may not be detected, the assurance ofprogressivity diminishes. If the probability of recovering PPBD isnegligible, then the probability of progressive behavior is alsonegligible.

A level of variegation that allows recovery of the PPBD has twoproperties:

1) we can not regress because the PPBD is available,

2) an enormous number of multiple changes related to the PPBD areavailable for selection and we are able to detect and benefit from thesechanges.

It is very unlikely that all of the variants will be worse than thePPBD; we desire the presence of PPBD at detectable levels to insure thatall the sequences present are indeed related to PPBD.

An opposing force in our design considerations is that PBDs are usefulin the population only up to the amount that can be detected; any excessabove the detectable amount is wasted. Thus we produce as many surfacesrelated to PPBD as possible within the constraint that the PPBD bedetectable.

If the level of variegation in the previous variegation cycle wascorrectly chosen, then the amino acids selected to be in the residuesjust varied are the ones best determined. The environment of otherresidues has changed, so that it is appropriate to vary them again.Because there are often more residues in the principal and secondarysets than can be varied simultaneously, we start by picking residuesthat either have never been varied (highest priority) or that have notbeen varied for one or more cycles. If we find that varying all theresidues except those varied in the previous cycle does not allow a highenough level of diversity, then residues varied in the previous cyclemight be varied again. For example, if M_(ntv) (the number ofindependent transformants that can be produced from Y_(D100) of DNA) andC_(sensi) (the sensitivity of the affinity separation) were such thatseven residues could be varied, and if the principal and secondary setscontained 13 residues, we would always vary seven residues, even thoughthat implies varying some residue twice in a row. In such cases, wewould pick the residues just varied that contain the amino acids ofhighest abundance in the variegated codons used.

It is the accumulation of information that allows the process to selectthose protein sequences that produce binding between the SBD and thetarget. Some interfaces between proteins and other molecules involvetwenty or more residues. Complete variation of twenty residues wouldgenerate 10²⁶ different proteins. By dividing the residues that lieclose together in space into overlapping groups of five to sevenresidues, we can vary a large surface but never need to test more than10⁷ to 10⁹ candidates at once, a savings of 10¹⁹ to 10¹⁷ fold. The powerof selection with accumulation of information is well illustrated inChapter 3 of DAWK86.

Use of NNT or NNG variegated codons leads to very efficient sampling ofvariegated libraries because the ratio of (different amino-acidsequences)/(different DNA sequences) is much closer to unity than it isfor NNK or even the optimized vg codon (fxS). Nevertheless, a few aminoacids are omitted in each case. Both NNT and NNG allow members of allimportant classes of amino acids: hydrophobic, hydrophilic, acidic,basic, neutral hydrophilic, small, and large. After selecting a bindingdomain, a subsequent variegation and selection may be desirable toachieve a higher affinity or specificity. During this secondvariegation, amino acid possibilities overlooked by the precedingvariegation may be investigated.

In the first round, we assume that the parental protein has no knownaffinity for the target material. For example, consider the parentalmini-protein, similar to that discussed in Example 11, having thestructure X₁ -C₂ -X₃ -X₄ -X₅ -X₆ -C₇ -X₈ (SEQ ID NO: 37) in which C₂ andC₇ form a disulfide bond. Introduction of extra cysteines may causealternative structures to form which might be disadvantageous.Accidental cysteines at positions 4 or 5 are thought to be potentiallymore troublesome than at the other positions. We adopt the pattern ofvariegation: X₁ :NNT, X₃ :NNT, X₄ :NNG, X₅ :NNG, X₆ :NNT, and X₈ :NNT,so that cysteine can not occur at positions 4 and 5 (DNA SequenceNNT.TGT.NNT,NNG.NNG.NNT.TGT.NNT has SEQ ID NO. 89) (Table 131 shows thenumber of different amino acids expected in libraries prepared with DNAvariegated in this way and comprising different numbers of independenttransformants.)

In the second round of variegation, a preferred strategy is to vary eachposition through a new set of residues which includes the amino acid(s)which were found at that position in the successful binding domains, andwhich include as many as possible of the residues which were excluded inthe first round of variegation.

A few examples may be helpful. Suppose we obtained PRO using NNT. Thisamino acid is available with either NNT or NNG. We can be reasonablysure that PRO is the best amino acid from the set [PRO, LEU, VAL, THR,ALA, ARG, GLY, PHE, TYR, CYS, HIS, ILE, ASN, ASP, SER]. Thus we need totry a set that includes [PRO, TRP, GLN, MET, LYS, GLU]. The set allowedby NNG is the preferred set.

What if we obtained HIS instead? Histidine is aromatic and fairlyhydrophobic and can form hydrogen bonds to and from the imidazole ring.Tryptophan is hydrophobic and aromatic and can donate a hydrogen to asuitable acceptor and was excluded by the NNT codon. Methionine was alsoexcluded and is hydrophobic. Thus, one preferred course is to use thevariegated codon HDS that allows [HIS, GLN, ASN, LYS, TYR, CYS, TRP,ARG, SER, GLY, <stop>].

GLN can be encoded by the NNG codon. If GLN is selected, at the nextround we might use the vg codon VAS that encodes three of the sevenexcluded possibilities, viz. HIS, ASN, and ASP. The codon VAS encodes 6amino acid sequences in six DNA sequences. This leaves PHE, CYS, TYR,and ILE untested, but these are all very hydrophobic. Switching to NNTwould be undesirable because that would exclude GLN. One could use NASthat includes TYR and <stop>. Suppose the successful amino acid encodedby an NNG codon was ARG. Here we switch to NNT because this allows ARGplus all the excluded possibilities.

THR is another possibility with the NNT codon. If THR is selected, weswitch to NNG because that includes the previously excludedpossibilities and includes THR. Suppose the successful amino acidencoded by the NNT codon was ASP. We use RRS at the next variegationbecause this includes both acidic amino acids plus LYS and ARG. Onecould also use VRS to allow GLN.

Thus, later rounds of variegation test both amino acid positions notpreviously mutated, and amino acid substitutions at a previously mutatedposition which were not within the previous substitution set.

If the first round of variegation is entirely unsuccessful, a differentpattern of variegation should be used. For example, if more than oneinteraction set can be defined within a domain, the residues varied inthe next round of variegation should be from a different set than thatprobed in the initial variegation. If repeated failures are encountered,one may switch to a different IPBD.

IV. DISPLAY STRATEGY: DISPLAYING FOREIGN BINDING DOMAINS ON THE SURFACEOF A "GENETIC PACKAGE" IV.A. General Requirements for Genetic Packages

It is emphasized that the GP on which selection-through-binding will bepracticed must be capable, after the selection, either of growth in somesuitable environment or of in vitro amplification and recovery of theencapsulated genetic message. During at least part of the growth, theincrease in number is preferably approximately exponential with respectto time. The component of a population that exhibits the desired bindingproperties may be quite small, for example, one in 10⁶ or less. Oncethis component of the population is separated from the non-bindingcomponents, it must be possible to amplify it. Culturing viable cells isthe most powerful amplification of genetic material known and ispreferred. Genetic messages can also be amplified in vitro, e.g. by PCR,but this is not the most preferred method.

Preferred GPs are vegetative bacterial cells, bacterial spores andbacterial DNA viruses. Eukaryotic cells could be used as geneticpackages but have longer dividing times and more stringent nutritionalrequirements than do bacteria and it is much more difficult to produce alarge number of independent transformants. They are also more fragilethan bacterial cells and therefore more difficult to chromatographwithout damage. Eukaryotic viruses could be used instead ofbacteriophage but must be propagated in eukaryotic cells and thereforesuffer from some of the amplification problems mentioned above.

Nonetheless, a strain of any living cell or virus is potentially usefulif the strain can be: 1) genetically altered with reasonable facility toencode a potential binding domain, 2) maintained and amplified inculture, 3) manipulated to display the potential binding protein domainwhere it can interact with the target material during affinityseparation, and 4) affinity separated while retaining the geneticinformation encoding the displayed binding domain in recoverable form.Preferably, the GP remains viable after affinity separation.

When the genetic package is a bacterial cell, or a phage which isassembled periplasmically, the display means has two components. Thefirst component is a secretion signal which directs the initialexpression product to the inner membrane of the cell (a host cell whenthe package is a phage). This secretion signal is cleaved off by asignal peptidase to yield a processed, mature, potential bindingprotein. The second component is an outer surface transport signal whichdirects the package to assemble the processed protein into its outersurface. Preferably, this outer surface transport signal is derived froma surface protein native to the genetic package.

For example, in a preferred embodiment, the hybrid gene comprises a DNAencoding a potential binding domain operably linked to a signal sequence(e.g., the signal sequences of the bacterial phoA or bla genes or thesignal sequence of M13 phage qeneIII) and to DNA encoding a coat protein(e.g., the M13 gene III or gene VIII proteins) of a filamentous phage(e.g., M13). The expression product is transported to the inner membrane(lipid bilayer) of the host cell, whereupon the signal peptide iscleaved off to leave a processed hybrid protein. The C-terminus of thecoat protein-like component of this hybrid protein is trapped in thelipid bilayer, so that the hybrid protein does not escape into theperiplasmic space. (This is typical of the wild-type coat protein.) Asthe single-stranded DNA of the nascent phage particle passes into theperiplasmic space, it collects both wild-type coat protein and thehybrid protein from the lipid bilayer. The hybrid protein is thuspackaged into the surface sheath of the filamentous phage, leaving thepotential binding domain exposed on its outer surface. (Thus, thefilamentous phage, not the host bacterial cell, is the "replicablegenetic package" in this embodiment.)

If a secretion signal is necessary for the display of the potentialbinding domain, in an especially preferred embodiment the bacterial cellin which the hybrid gene is expressed is of a "secretion-permissive"strain.

When the genetic package is a bacterial spore, or a phage whose coat isassembled intracellularly, a secretion signal directing the expressionproduct to the inner membrane of the host bacterial cell is unnecessary.In these cases, the display means is merely the outer surface transportsignal, typically a derivative of a spore or phage coat protein.

There are several methods of arranging that the ipbd gene is expressedin such a manner that the IPBD is displayed on the outer surface of theGP. If one or more fusions of fragments of x genes to fragments of anatural osp gene are known to cause X protein domains to appear on theGP surface, then we pick the DNA sequence in which an ipbd gene fragmentreplaces the x gene fragment in one of the successful osp-x fusions as apreferred gene to be tested for the display-of-IPBD phenotype. (The genemay be constructed in any manner.) If no fusion data are available, thenwe fuse an ipbd fragment to various fragments, such as fragments thatend at known or predicted domain boundaries, of the osp gene and obtainGPs that display the osp-ipbd fusion on the GP outer surface byscreening or selection for the display-of-IPBD phenotype. The OSP may bemodified so as to increase the flexibility and/or length of the linkagebetween the OSP and the IPBD and thereby reduce interference between thetwo.

The fusion of ipbd and osp fragments may also include fragments ofrandom or pseudorandom DNA to produce a population, members of which maydisplay IPBD on the GP surface. The members displaying IPBD are isolatedby screening or selection for the display-of-binding phenotype.

The replicable genetic entity (phage or plasmid) that carries theosp-pbd genes (derived from the osp-ipbd gene) through theselection-through-binding process, is referred to hereinafter as theoperative cloning vector (OCV). When the OCV is a phage, it may alsoserve as the genetic package. The choice of a GP is dependent in part onthe availability of a suitable OCV and suitable OSP.

Preferably, the GP is readily stored, for example, by freezing. If theGP is a cell, it should have a short doubling time, such as 20-40minutes. If the GP is a virus, it should be prolific, e.g., a burst sizeof at least 100/infected cell. GPs which are finicky or expensive toculture are disfavored. The GP should be easy to harvest, preferably bycentrifugation. The GP is preferably stable for a temperature range of-70° to 42° C. (stable at 4° C. for several days or weeks); resistant toshear forces found in HPLC; insensitive to UV; tolerant of desiccation;and resistant to a pH of 2.0 to 10.0, surface active agents such as SDSor Triton, chaotropes such as 4M urea or 2M guanidinium HCl, common ionssuch as K³⁰, Na⁺, and SO₄ ⁻⁻, common organic solvents such as ether andacetone, and degradative enzymes. Finally, there must be a suitable OCV.

Although knowledge of specific OSPs may not be required for vegetativebacterial cells and endospores, the user of the present invention,preferably, will know: Is the sequence of any osp known? (preferablyyes, at least one required for phage). How does the OSP arrive at thesurface of GP? (knowledge of route necessary, different routes havedifferent uses, no route preferred per se). Is the OSPpost-translationally processed? (no processing most preferred,predictable processing preferred over unpredictable processing). Whatrules are known governing this processing, if there is any processing?(no processing most preferred, predictable processing acceptable). Whatfunction does the OSP serve in the outer surface? (preferably notessential). Is the 3 D structure of an OSP known? (highly preferred).Are fusions between fragments of osp and a fragment of x known? Doesexpression of these fusions lead to X appearing on the surface of theGP? (fusion data is as preferred as knowledge of a 3 D structure). Is a"2 D" structure of an OSP available? (in this context, a "2 D" structureindicates which residues are exposed on the cell surface) (2 D structureless preferred than 3 D structure). Where are the domain boundaries inthe OSP? (not as preferred as a 2 D structure, but acceptable). CouldIPBD go through the same process as OSP and fold correctly? (IPBD mightneed prosthetic groups) (preferably IPBD will fold after same process).Is the sequence of an osp promoter known? (preferably yes). Is osp genecontrolled by regulatable promoter available? (preferably yes). Whatactivates this promoter? (preferably a diffusible chemical, such asIPTG). How many different OSPs do we know? (the more the better). Howmany copies of each OSP are present on each package? (more is better).

The user will want knowledge of the physical attributes of the GP: Howlarge is the GP? (knowledge useful in deciding how to isolate GPs)(preferably easy to separate from soluble proteins such as IgGs). Whatis the charge on the GP? (neutral preferred). What is the sedimentationrate of the GP? (knowledge preferred, no particular value preferred).

The preferred GP, OCV and OSP are those for which the fewest seriousobstacles can be seen, rather than the one that scores highest on anyone criterion.

Viruses are preferred over bacterial cells and spores (cp. LUIT85 andreferences cited therein). The virus is preferably a DNA virus with agenome size of 2 kb to 10 kb base pairs, such as (but not limited to)the filamentous (Ff) phage M13, fd, and fl (inter alia see RASC86,BOEK80, BOEK82, DAYL88, GRAY81b, KUHN88, LOPE85, WEBS85, MARV75, MARV80,MOSE82, CRIS84, SMIT88a, SMIT88b) the IncN specific phage Ike and Ifl(NAKA81, PEET85, PEET87, THOM83, THOM88a); IncP-specific Pseudomonasaeruginosa phage Pf1 (THOM83, THOM88a) and Pf3 (LUIT83, LUIT85, LUTI87,THOM88a); and the Xanthomonas oryzae phage Xf (THOM83, THOM88a).Filamentous phage are especially preferred.

Preferred OSPs for several GPs are given in Table 2. References toosp-ipbd fusions in this section should be taken to apply, mutatismutandis, to osp-pbd and osp-sbd fusions as well.

The species chosen as a GP should have a well-characterized geneticsystem and strains defective in genetic recombination should beavailable. The chosen strain may need to be manipulated to preventchanges of its physiological state that would alter the number or typeof proteins or other molecules on the cell surface during the affinityseparation procedure.

IV.B. Phages for Use as GPs

Unlike bacterial cells and spores, choice of a phage depends strongly onknowledge of the 3 D structure of an OSP and how it interacts with otherproteins in the capsid. This does not mean that we need atomicresolution of the OSP, but that we need to know which segments of theOSP interact to make the viral coat and which segments are notconstrained by structural or functional roles. The size of the phagegenome and the packaging mechanism are also important because the phagegenome itself is the cloning vector. The osp-ipbd gene is inserted intothe phage genome; therefore: 1) the genome of the phage must allowintroduction of the osp-ipbd gene either by tolerating additionalgenetic material or by having replaceable genetic material; 2) thevirion must be capable of packaging the genome after accepting theinsertion or substitution of genetic material, and 3) the display of theOSP-IPBD protein on the phage surface must not disrupt virion structuresufficiently to interfere with phage propagation.

The morphogenetic pathway of the phage determines the environment inwhich the IPBD will have opportunity to fold. Periplasmically assembledphage are preferred when IPBDs contain essential disulfides, as suchIPBDs may not fold within a cell (these proteins may fold after thephage is released from the cell). Intracellularly assembled phage arepreferred when the IPBD needs large or insoluble prosthetic groups (suchas Fe₄ S₄ clusters), since the IPBD may not fold if secreted because theprosthetic group is lacking.

When variegation is introduced in Part II, multiple infections couldgenerate hybrid GPs that carry the gene for one PBD but have at leastsome copies of a different PBD on their surfaces; it is preferable tominimize this possibility by infecting cells with phage under conditionsresulting in a low multiple-of-infection (MOI).

Bacteriophages are excellent candidates for GPs because there is littleor no enzymatic activity associated with intact mature phage, andbecause the genes are inactive outside a bacterial host, rendering themature phage particles metabolically inert.

The filamentous phages (e.g., M13) are of particular interest.

For a given bacteriophage, the preferred OSP is usually one that ispresent on the phage surface in the largest number of copies, as thisallows the greatest flexibility in varying the ratio of OSP-IPBD to wildtype OSP and also gives the highest likelihood of obtaining satisfactoryaffinity separation. Moreover, a protein present in only one or a fewcopies usually performs an essential function in morphogenesis orinfection; mutating such a protein by addition or insertion is likely toresult in reduction in viability of the GP. Nevertheless, an OSP such asM13 gIII protein may be an excellent choice as OSP to cause display ofthe PBD.

It is preferred that the wild-type osp gene be preserved. The ipbd genefragment may be inserted either into a second copy of the recipient ospgene or into a novel engineered osp gene. It is preferred that theosp-ipbd gene be placed under control of a regulated promoter. Ourprocess forces the evolution of the PBDs derived from IPBD so that someof them develop a novel function, viz. binding to a chosen target.Placing the gene that is subject to evolution on a duplicate gene is animitation of the widely-accepted scenario for the evolution of proteinfamilies. It is now generally accepted that gene duplication is thefirst step in the evolution of a protein family from an ancestralprotein. By having two copies of a gene, the affected physiologicalprocess can tolerate mutations in one of the genes. This process is wellunderstood and documented for the globin family (cf. DICK83, p65ff, andCREI84, p117-125).

The user must choose a site in the candidate OSP gene for inserting aipbd gene fragment. The coats of most bacteriophage are highly ordered.Filamentous phage can be described by a helical lattice; isometricphage, by an icosahedral lattice. Each monomer of each major coatprotein sits on a lattice point and makes defined interactions with eachof its neighbors. Proteins that fit into the lattice by making some, butnot all, of the normal lattice contacts are likely to destabilize thevirion by: a) aborting formation of the virion, b) making the virionunstable, or c) leaving gaps in the virion so that the nucleic acid isnot protected. Thus in bacteriophage, unlike the cases of bacteria andspores, it is important to retain in engineered OSP-IPBD fusion proteinsthose residues of the parental OSP that interact with other proteins inthe virion. For M13 gVIII, we retain the entire mature protein, whilefor M13 gIII, it might suffice to retain the last 100 residues (or evenfewer). Such a truncated gIII protein would be expressed in parallelwith the complete gIII protein, as gIII protein is required for phageinfectivity.

Il'ichev et al. (ILIC89) have reported viable phage having alterationsin gene VIII. In one case, a point mutation changed one amino acid nearthe amino terminus of the mature gVIII protein from GLU to ASP. In theother case, five amino acids were inserted at the site of the firstmutation. They suggested that similar constructions could be used forvaccines. They did not report on any binding properties of the modifiedphage, nor did they suggest mutagenizing the inserted material.Furthermore, they did not insert a binding domain, nor did they suggestinserting such a domain.

Further considerations on the design of the ipbd::osp gene is discussedin section IV.F.

Filamentous Phage

Compared to other bacteriophage, filamentous phage in general areattractive and M13 in particular is especially attractive because: 1)the 3 D structure of the virion is known; 2) the processing of the coatprotein is well understood; 3) the genome is expandable; 4) the genomeis small; 5) the sequence of the genome is known; 6) the virion isphysically resistant to shear, heat, cold, urea, guanidinium Cl, low pH,and high salt; 7) the phage is a sequencing vector so that sequencing isespecially easy; 8) antibiotic-resistance genes have been cloned intothe genome with predictable results (HINE80); 9) It is easily culturedand stored (FRIT85), with no unusual or expensive media requirements forthe infected cells, 10) it has a high burst size, each infected cellyielding 100 to 1000 M13 progeny after infection; and 11) it is easilyharvested and concentrated (SALI64, FRIT85).

The filamentous phage include M13, f1, fd, If1, Ike, Xf, Pf1, and Pf3.

The entire life cycle of the filamentous phage M13, a common cloning andsequencing vector, is well understood. M13 and f1 are so closely relatedthat we consider the properties of each relevant to both (RASC86); anydifferentiation is for historical accuracy. The genetic structure (thecomplete sequence (SCHA78), the identity and function of the ten genes,and the order of transcription and location of the promoters) of M13 iswell known as is the physical structure of the virion (BANN81, BOEK80,CHAN79, ITOK79, KAPL78, KUHN85b, KUHN87, MAK080, MARV78, MESS78, OHKA81,RASC86, RUSS81, SCHA78, SMIT85, WEBS78, and ZIMM82); see RASC86 for arecent review of the structure and function of the coat proteins.Because the genome is small (6423 bp), cassette mutagenesis is practicalon RF M13 (AUSU87), as is single-stranded oligo-nt directed mutagenesis(FRIT85). M13 is a plasmid and transformation system in itself, and anideal sequencing vector. M13 can be grown on Rec⁻ strains of E. coli.The M13 genome is expandable (MESS78, FRIT85) and M13 does not lysecells. Because the M13 genome is extruded through the membrane andcoated by a large number of identical protein molecules, it can be usedas a cloning vector (WATS87 p278, and MESS77). Thus we can insert extragenes into M13 and they will be carried along in a stable manner.

Marvin and collaborators (MARV78, MAK080, BANN81) have determined anapproximate 3 D virion structure of f1 by a combination of genetics,biochemistry, and X-ray diffraction from fibers of the virus. FIG. 4 isdrawn after the model of Banner et al. (BANN81) and shows only theC.sub.α s of the protein. The apparent holes in the cylindrical sheathare actually filled by protein side groups so that the DNA within isprotected. The amino terminus of each protein monomer is to the outsideof the cylinder, while the carboxy terminus is at smaller radius, nearthe DNA. Although other filamentous phages (e.g. Pf1 or Ike) havedifferent helical symmetry, all have coats composed of many shortα-helical monomers with the amino terminus of each monomer on the virionsurface.

The major coat protein is encoded by gene VIII. The 50 amino acid maturegene VIII coat protein is synthesized as a 73 amino acid precoat(ITOK79). The first 23 amino acids constitute a typical signal-sequencewhich causes the nascent polypeptide to be inserted into the inner cellmembrane. Whether the precoat inserts into the membrane by itself orthrough the action of host secretion components, such as SecA and SecY,remains controversial, but has no effect on the operation of the presentinvention.

An E. coli signal peptidase (SP-I) recognizes amino acids 18, 21, and23, and, to a lesser extent, residue 22, and cuts between residues 23and 24 of the precoat (KUHN85a, KUHN85b, OLIV87). After removal of thesignal sequence, the amino terminus of the mature coat is located on theperiplasmic side of the inner membrane; the carboxy terminus is on thecytoplasmic side. About 3000 copies of the mature 50 amino acid coatprotein associate side-by-side in the inner membrane.

The sequence of gene VIII is known, and the amino acid sequence can beencoded on a synthetic gene, using lacUV5 promoter and used inconjunction with the LacI^(q) repressor. The lacUV5 promoter is inducedby IPTG. Mature gene VIII protein makes up the sheath around thecircular ssDNA. The 3 D structure of f1 virion is known at mediumresolution; the amino terminus of gene VIII protein is on surface of thevirion. A few modifications of gene VIII have been made and arediscussed below. The 2 D structure of M13 coat protein is implicit inthe 3 D structure. Mature M13 gene VIII protein has only one domain.

When the GP is M13 the gene III and the gene VIII proteins are highlypreferred as OSP (see Examples I through IV). The proteins from genesVI, VII, and IX may also be used.

As discussed in the Examples, we have constructed a tripartite genecomprising:

1) DNA encoding a signal sequence directing secretion of parts (2) and(3) through the inner membrane,

2) DNA encoding the mature BPTI sequence, and

3) DNA encoding the mature M13 gVIII protein.

This gene causes BPTI to appear in active form on the surface of M13phage.

The gene VIII protein is a preferred OSP because it is present in manycopies and because its location and orientation in the virion are known(BANN81). Preferably, the PBD is attached to the amino terminus of themature M13 coat protein. Had direct fusion of PBD to M13 CP failed tocause PBD to be displayed on the surface of M13, we would have variedpart of the mini-protein sequence and/or insert short random ornonrandom spacer sequences between mini-protein and M13 CP. The 3 Dmodel of f1 indicates strongly that fusing IPBD to the amino terminus ofM13 CP is more likely to yield a functional chimeric protein than anyother fusion site.

Similar constructions could be made with other filamentous phage. Pf3 isa well known filamentous phage that infects Pseudomonas aerugenosa cellsthat harbor an IncP-1 plasmid. The entire genome has been sequenced(LUIT85) and the genetic signals involved in replication and assemblyare known (LUIT87). The major coat protein of PF3 is unusual in havingno signal peptide to direct its secretion. The sequence has chargedresidues ASP₇, ARG₃₇, LYS₄₀, and PHE₄₄ --COO⁻ which is consistent withthe amino terminus being exposed. Thus, to cause an IPBD to appear onthe surface of Pf3, we construct a tripartite gene comprising:

1) a signal sequence known to cause secretion in P. aerugenosa(preferably known to cause secretion of IPBD) fused in-frame to,

2) a gene fragment encoding the IPBD sequence, fused in-frame to,

3) DNA encoding the mature Pf3 coat protein.

Optionally, DNA encoding a flexible linker of one to 10 amino acids isintroduced between the ipbd gene fragment and the Pf3 coat-protein gene.Optionally, DNA encoding the recognition site for a specific protease,such as tissue plasminogen activator or blood clotting Factor Xa, isintroduced between the ipbd gene fragment and the Pf3 coat-protein gene.Amino acids that form the recognition site for a specific protease mayalso serve the function of a flexible linker. This tripartite gene isintroduced into Pf3 so that it does not interfere with expression of anyPf3 genes. To reduce the possibility of genetic recombination, part (3)is designed to have numerous silent mutations relative to the wild-typegene. Once the signal sequence is cleaved off, the IPBD is in theperiplasm and the mature coat protein acts as an anchor andphage-assembly signal. It matters not that this fusion protein comes torest anchored in the lipid bilayer by a route different from the routefollowed by the wild-type coat protein.

The amino-acid sequence of M13 pre-coat (SCHA78), called AA₋₋ seq1, is##STR3## The single-letter codes for amino acids and the codes forambiguous DNA are given in Table 1. The best site for inserting a novelprotein domain into M13 CP is after A23 because SP-I cleaves the precoatprotein after A23, as indicated by the arrow. Proteins that can besecreted will appear connected to mature M13 CP at its amino terminus.Because the amino terminus of mature M13 CP is located on the outersurface of the virion, the introduced domain will be displayed on theoutside of the virion. The uncertainty of the mechanism by which M13CPappears in the lipid bilayer raises the possibility that directinsertion of bpti into gene VIII may not yield a functional fusionprotein. It may be necessary to change the signal sequence of the fusionto, for example, the phoA signal sequence (MKQSTIALALLPLLFTPVTKA . . .). Marks et al. (MARK86) showed that the phoA signal peptide coulddirect mature BPTI to the E. coli periplasm.

Another vehicle for displaying the IPBD is by expressing it as a domainof a chimeric gene containing part or all of gene III. This gene encodesone of the minor coat proteins of M13. Genes VI, VII, and IX also encodeminor coat proteins. Each of these minor proteins is present in about 5copies per virion and is related to morphogenesis or infection. Incontrast, the major coat protein is present in more than 2500 copies pervirion. The gene VI, VII, and IX proteins are present at the ends of thevirion; these three proteins are not post-translationally processed(RASC86).

The single-stranded circular phage DNA associates with about five copiesof the gene III protein and is then extruded through the patch ofmembrane-associated coat protein in such a way that the DNA is encasedin a helical sheath of protein (WEBS78). The DNA does not base pair(that would impose severe restrictions on the virus genome); rather thebases intercalate with each other independent of sequence.

Smith (SMIT85) and de la Cruz et al. (DELA88) have shown that insertionsinto gene III cause novel protein domains to appear on the virion outersurface. The mini-protein's gene may be fused to gene III at the siteused by Smith and by de la Cruz et al., at a codon corresponding toanother domain boundary or to a surface loop of the protein, or to theamino terminus of the mature protein.

All published works use a vector containing a single modified gene IIIof fd. Thus, all five copies of gIII are identically modified. Gene IIIis quite large (1272 b.p. or about 20% of the phage genome) and it isuncertain whether a duplicate of the whole gene can be stably insertedinto the phage. Furthermore, all five copies of gIII protein are at oneend of the virion. When bivalent target molecules (such as antibodies)bind a pentavalent phage, the resulting complex may be irreversible.Irreversible binding of the GP to the target greatly interferes withaffinity enrichment of the GPs that carry the genetic sequences encodingthe novel polypeptide having the highest affinity for the target.

To reduce the likelihood of formation of irreversible complexes, we mayuse a second, synthetic gene that encodes carboxy-terminal parts of III.We might, for example, engineer a gene that consists of (from 5' to 3'):

1) a promoter (preferably regulated),

2) a ribosome-binding site,

3) an initiation codon,

4) a functional signal peptide directing secretion of parts (5) and (6)through the inner membrane,

5) DNA encoding an IPBD,

6) DNA encoding residues 275 through 424 of M13 gIII protein,

7) a translation stop codon, and

8) (optionally) a transcription stop signal.

We leave the wild-type gene III so that some unaltered gene III proteinwill be present. Alternatively, we may use gene VIII protein as the OSPand regulate the osp::ipbd fusion so that only one or a few copies ofthe fusion protein appear on the phage.

M13 gene VI, VII, and IX proteins are not processed after translation.The route by which these proteins are assembled into the phage have notbeen reported. These proteins are necessary for normal morphogenesis andinfectivity of the phage. Whether these molecules (gene VI protein, geneVII protein, and gene IX protein) attach themselves to the phage: a)from the cytoplasm, b) from the periplasm, or c) from within the lipidbilayer, is not known. One could use any of these proteins to introducean IPBD onto the phage surface by one of the constructions:

1) ipbd::pmcp,

2) pmcp::ipbd,

3) signal::ipbd::pmcp, and

4) signal::pmcp::ipbd.

where ipbd represents DNA coding on expression for the initial potentialbinding domain; pmcp represents DNA coding for one of the phage minorcoat proteins, VI, VII, and IX; signal represents a functional secretionsignal peptide, such as the phoA signal (MKQSTIALALLPLLFTPVTKA); and"::" represents in-frame genetic fusion. The indicated fusions areplaced downstream of a known promoter, preferably a regulated promotersuch as lacUV5, tac, or trp. Fusions (1) and (2) are appropriate whenthe minor coat protein attaches to the phage from the cytoplasm or byautonomous insertion into the lipid bilayer. Fusion (1) is appropriateif the amino terminus of the minor coat protein is free and (2) isappropriate if the carboxy terminus is free. Fusions (3) and (4) areappropriate if the minor coat protein attaches to the phage from theperiplasm or from within the lipid bilayer. Fusion (3) is appropriate ifthe amino terminus of the minor coat protein is free and (4) isappropriate if the carboxy terminus is free.

Bacteriophage ΦX174

The bacteriophage ΦX174 is a very small icosahedral virus which has beenthoroughly studied by genetics, biochemistry, and electron microscopy(See The Single-Stranded DNA Phages (DENH78)). To date, no proteins fromΦX174 have been studied by X-ray diffraction. ΦX174 is not used as acloning vector because ΦX174 can accept very little additional DNA; thevirus is so tightly constrained that several of its genes overlap.Chambers et al. (CHAM82) showed that mutants in gene G are rescued bythe wild-type G gene carried on a plasmid so that the host supplies thisprotein.

Three gene products of ΦX174 are present on the outside of the maturevirion: F (capsid), G (major spike protein, 60 copies per virion), and H(minor spike protein, 12 copies per virion). The G protein comprises 175amino acids, while H comprises 328 amino acids. The F protein interactswith the single-stranded DNA of the virus. The proteins F, G, and H aretranslated from a single mRNA in the viral infected cells. If the Gprotein is supplied from a plasmid in the host, then the viral g gene isno longer essential. We introduce one or more stop codons into g so thatno G is produced from the viral gene. We fuse a pbd gene fragment to h,either at the 3' or 5' terminus. We eliminate an amount of the viral ggene equal to the size of pbd so that the size of the genome isunchanged.

Large DNA Phages

Phage such as λ or T4 have much larger genomes than do M13 or ΦX174.Large genomes are less conveniently manipulated than small genomes.Phage λ has such a large genome that cassette mutagenesis is notpracticable. One can not use annealing of a mutagenic oligonucleotideeither, because there is no ready supply of single-stranded λ DNA. (λDNA is packaged as double-stranded DNA.) Phage such as λ and T4 havemore complicated 3 D capsid structures than M13 or ΦX174, with more OSPsto choose from. Intracellular morphogenesis of phage λ could causeprotein domains that contain disulfide bonds in their folded forms notto fold.

Phage λ virions and phage T4 virions form intracellularly, so that IPBDsrequiring large or insoluble prosthetic groups might fold on thesurfaces of these phage.

RNA Phages

RNA phage are not preferred because manipulation of RNA is much lessconvenient than is the manipulation of DNA. If the RNA phage MS2 weremodified to make room for an osp-ipbd gene and if a message containingthe A protein binding site and the gene for a chimera of coat proteinand a PBD were produced in a cell that also contained A protein andwild-type coat protein (both produced from regulated genes on aplasmid), then the RNA coding for the chimeric protein would getpackaged. A package comprising RNA encapsulated by proteins encoded bythat RNA satisfies the major criterion that the genetic message insidethe package specifies something on the outside. The particles bythemselves are not viable unless the modified A protein is functional.After isolating the packages that carry an SBD, we would need to: 1)separate the RNA from the protein capsid; 2) reverse transcribe the RNAinto DNA, using AMV or MMTV reverse transcriptase, and 3) use Thermusaquaticus DNA polymerase for 25 or more cycles of Polymerase ChainReaction(TM) to amplify the osp-sbd DNA until there is enough tosubclone the recovered genetic message into a plasmid for sequencing andfurther work.

Alternatively, helper phage could be used to rescue the isolated phage.In one of these ways we can recover a sequence that codes for an SBDhaving desirable binding properties.

IV.C Bacterial Cells as Genetic Packages

One may choose any well-characterized bacterial strain which (1) may begrown in culture (2) may be engineered to display PBDs on its surface,and (3) is compatible with affinity selection.

Among bacterial cells, the preferred genetic packages are Salmonellatyphimurium, Bacillus subtilis, Pseudomonas aeruginosa, Vibrio cholerae,Klebsiella pneumonia, Neisseria gonorrhoeae, Neisseria meningitidis,Bacteroides nodosus, Moraxella bovis, and especially Escherichia coli.The potential binding mini-protein may be expressed as an insert in achimeric bacterial outer surface protein (OSP). All bacteria exhibitproteins on their outer surfaces. Works on the localization of OSPs andthe methods of determining their structure include: CALA90, HEIJ90,EHRM90, BENZ88a, BENZ88b, MAN088, BAKE87, RAND87, HANC87, HENR87,NAKA86b, MAN086, SILH85, TOMM85, NIKA84, LUGT83, and BECK83.

In E. coli, LamB is a preferred OSP. As discussed below, there are anumber of very good alternatives in E. coli and there are very goodalternatives in other bacterial species. There are also methods fordetermining the topology of OSPs so that it is possible tosystematically determine where to insert an ipbd into an osp gene toobtain display of an IPBD on the surface of any bacterial species.

In view of the extensive knowledge of E. coli, a strain of E. coli,defective in recombination, is the strongest candidate as a bacterialGP.

Oliver has reviewed mechanisms of protein secretion in bacteria (OLIV85aand OLIV87). Nikaido and Vaara (NIKA87), Benz (BENZ88b), and Baker etal. (BAKE87) have reviewed mechanisms by which proteins become localizedto the outer membrane of gram-negative bacteria. While most bacterialproteins remain in the cytoplasm, others are transported to theperiplasmic space (which lies between the plasma membrane and the cellwall of gram-negative bacteria), or are conveyed and anchored to theouter surface of the cell. Still others are exported (secreted) into themedium surrounding the cell. Those characteristics of a protein that arerecognized by a cell and that cause it to be transported out of thecytoplasm and displayed on the cell surface will be termed"outer-surface transport signals".

Gram-negative bacteria have outer-membrane proteins (OMP), that form asubset of OSPs. Many OMPs span the membrane one or more times. Thesignals that cause OMPs to localize in the outer membrane are encoded inthe amino acid sequence of the mature protein. Outer membrane proteinsof bacteria are initially expressed in a precursor form including aso-called signal peptide. The precursor protein is transported to theinner membrane, and the signal peptide moiety is extruded into theperiplasmic space. There, it is cleaved off by a "signal peptidase", andthe remaining "mature" protein can now enter the periplasm. Once there,other cellular mechanisms recognize structures in the mature proteinwhich indicate that its proper place is on the outer membrane, andtransport it to that location.

It is well known that the DNA coding for the leader or signal peptidefrom one protein may be attached to the DNA sequence coding for anotherprotein, protein X, to form a chimeric gene whose expression causesprotein X to appear free in the periplasm (BECK83, INOU86 Ch10, LEEC86,MARK86, and BOQU87). That is, the leader causes the chimeric protein tobe secreted through the lipid bilayer; once in the periplasm, it iscleaved off by the signal peptidase SP-I.

The use of export-permissive bacterial strains (LISS85, STAD89)increases the probability that a signal-sequence-fusion will direct thedesired protein to the cell surface. Liss et al. (LISS85) showed thatthe mutation prlA4 makes E. coli more permissive with respect to signalsequences. Similarly, Stader et al. (STAD89) found a strain that bears aprlG mutation and that permits export of a protein that is blocked fromexport in wild-type cells. Such export-permissive strains are preferred.

OSP-IPBD fusion proteins need not fill a structural role in the outermembranes of Gram-negative bacteria because parts of the outer membranesare not highly ordered. For large OSPs there is likely to be one or moresites at which osp can be truncated and fused to ipbd such that cellsexpressing the fusion will display IPBDs on the cell surface. Fusions offragments of omp genes with fragments of an gene have led to X appearingon the outer membrane (CHAR88b, BENS84, CLEM81). When such fusions havebeen made, we can design an osp-ipbd gene by substituting ipbd for x inthe DNA sequence. Otherwise, a successful OMP-IPBD fusion is preferablysought by fusing fragments of the best omp to an ipbd, expressing thefused gene, and testing the resultant GPs for display-of-IPBD phenotype.We use the available data about the OMP to pick the point or points offusion between omp and ipbd to maximize the likelihood that IPBD will bedisplayed. (Spacer DNA encoding flexible linkers, made, e.g., of GLY,SER, and ASN, may be placed between the osp- and ipbd-derived fragmentsto facilitate display.) Alternatively, we truncate osp at several sitesor in a manner that produces osp fragments of variable length and fusethe osp fragments to ipbd; cells expressing the fusion are screened orselected which display IPBDs on the cell surface. Freudl et al. (FREU89)have shown that fragments of OSPs (such as OmpA) above a certain sizeare incorporated into the outer membrane. An additional alternative isto include short segments of random DNA in the fusion of omp fragmentsto ipbd and then screen or select the resulting variegated populationfor members exhibiting the display-of-IPBD phenotype.

In E. coli, the LamB protein is a well understood OSP and can be used(BENS84, CHAR90, RONC90, VAND90, CHAP90, MOLL90, CHAR88b, CHAR88c,CLEM81, DARG88, FERE82a, FERE82b, FERE83, FERE84, FERE86a, FERE86b,FERE89a, FERE89b, GEHR87, HALL82, NAKA86a, STAD86, HEIN88, BENS87b,BENS87c, BOUG84, BOUL86a, CHAR84) . The E. coli LamB has been expressedin functional form in S. typhimurium (DEVR84, BARB85, HARK87), V.cholerae (HARK86), and K. pneumonia (DEVR84, WEHM89), so that one coulddisplay a population of PBDs in any of these species as a fusion to E.coli LamB. K. pneumonia expresses a maltoporin similar to LamB (WEHM89)which could also be used. In P. aeruginosa, the D1 protein (a homologueof LamB) can be used (TRIA88).

LamB of E. coli is a porin for maltose and maltodextrin transport, andserves as the receptor for adsorption of bacteriophages λ and K10. LamBis transported to the outer membrane if a functional N-terminal sequenceis present; further, the first 49 amino acids of the mature sequence arerequired for successful transport (BENS84). As with other OSPs, LamB ofE. coli is synthesized with a typical signal-sequence which issubsequently removed. Homology between parts of LamB protein and otherouter membrane proteins OmpC, OmpF, and PhoE has been detected (NIKA84),including homology between LamB amino acids 39-49 and sequences of theother proteins. These subsequences may label the proteins for transportto the outer membrane.

The amino acid sequence of LamB is known (CLEM81), and a model has beendeveloped of how it anchors itself to the outer membrane (Reviewed by,among others, BENZ88b). The location of its maltose and phage bindingdomains are also known (HEIN88). Using this information, one mayidentify several strategies by which a PBD insert may be incorporatedinto LamB to provide a chimeric OSP which displays the PBD on thebacterial outer membrane.

When the PBDs are to be displayed by a chimeric transmembrane proteinlike LamB, the PBD could be inserted into a loop normally found on thesurface of the cell (cp. BECK83, MAN086). Alternatively, we may fuse a5' segment of the osp gene to the ipbd gene fragment; the point offusion is picked to correspond to a surface-exposed loop of the OSP andthe carboxy terminal portions of the OSP are omitted. In LamB, it hasbeen found that up to 60 amino acids may be inserted (CHAR88b) withdisplay of the foreign epitope resulting; the structural features ofOmpC, OmpA, OmpF, and PhoE are so similar that one expects similarbehavior from these proteins.

It should be noted that while LamB may be characterized as a bindingprotein, it is used in the present invention to provide an OSTS; itsbinding domains are not variegated.

Other bacterial outer surface proteins, such as OmpA, OmpC, OmpF, PhoE,and pilin, may be used in place of LamB and its homologues. OmpA is ofparticular interest because it is very abundant and because homologuesare known in a wide variety of gram-negative bacterial species. Baker etal. (BAKE87) review assembly of proteins into the outer membrane of E.coli and cite a topological model of OmpA (VOGE86) that predicts thatresidues 19-32, 62-73, 105-118, and 147-158 are exposed on the cellsurface. Insertion of a ipbd encoding fragment at about codon 111 or atabout codon 152 is likely to cause the IPBD to be displayed on the cellsurface. Concerning OmpA, see also MACI88 and MAN088. Porin Protein F ofPseudomonas aeruginosa has been cloned and has sequence homology to OmpAof E. coli (DUCH88). Although this homology is not sufficient to allowprediction of surface-exposed residues on Porin Protein F, the methodsused to determine the topological model of OmpA may be applied to PorinProtein F. Works related to use of OmpA as an OSP include BECK80 andMACI88.

Misra and Benson (MISR88a, MISR88b) disclose a topological model of E.coli OmpC that predicts that, among others, residues GLY164 and LEU250are exposed on the cell surface. Thus insertion of an ipbd gene fragmentat about codon 164 or at about codon 250 of the E. coli ompC gene or atcorresponding codons of the S. typhimurium ompC gene is likely to causeIPBD to appear on the cell surface. The ompC genes of other bacterialspecies may be used. Other works related to OmpC include CATR87 andCLIC88.

OmpF of E. coli is a very abundant OSP, ≧10⁴ copies/cell. Pages et al.(PAGE90) have published a model of OmpF indicating seven surface-exposedsegments. Fusion of an ipbd gene fragment, either as an insert or toreplace the 3' part of ompF, in one of the indicated regions is likelyto produce a functional ompF::ipbd gene the expression of which leads todisplay of IPBD on the cell surface. In particular, fusion at aboutcodon 111, 177, 217, or 245 should lead to a functional ompF::ipbd gene.Concerning OmpF, see also REID88b, PAGE88, BENS88, TOMM82, and SODE85.

Pilus proteins are of particular interest because piliated cells expressmany copies of these proteins and because several species (N.gonorrhoeae, P. aeruginosa, Moraxella bovis, Bacteroides nodosus, and E.coli) express related pilins. Getzoff and coworkers (GETZ88, PARG87,SOME85) have constructed a model of the gonococcal pilus that predictsthat the protein forms a four-helix bundle having structuralsimilarities to tobacco mosaic virus protein and myohemerythrin. On thismodel, both the amino and carboxy termini of the protein are exposed.The amino terminus is methylated. Elleman (ELLE88) has reviewed pilinsof Bacteroides nodosus and other species and serotype differences can berelated to differences in the pilin protein and that most variationoccurs in the C-terminal region. The amino-terminal portions of thepilin protein are highly conserved. Jennings et al. (JENN89) havegrafted a fragment of foot-and-mouth disease virus (residues 144-159)into the B. nodosus type 4 fimbrial protein which is highly homologousto gonococcal pilin. They found that expression of the 3'-terminalfusion in P. aeruginosa led to a viable strain that makes detectableamounts of the fusion protein. Jennings et al. did not vary the foreignepitope nor did they suggest any variation. They inserted a GLY-GLYlinker between the last pilin residue and the first residue of theforeign epitope to provide a "flexible linker". Thus a preferred placeto attach an IPBD is the carboxy terminus. The exposed loops of thebundle could also be used, although the particular internal fusionstested by Jennings et al. (JENN89) appeared to be lethal in P.aeruginosa. Concerning pilin, see also MCKE85 and ORND85.

Judd (JUDD86, JUDD85) has investigated Protein IA of N. gonorrhoeae andfound that the amino terminus is exposed; thus, one could attach an IPBDat or near the amino terminus of the mature P.IA as a means to displaythe IPBD on the N. gonorrhoeae surface.

A model of the topology of PhoE of E. coli has been disclosed by van derLey et al. (VAND86). This model predicts eight loops that are exposed;insertion of an IPBD into one of these loops is likely to lead todisplay of the IPBD on the surface of the cell. Residues 158, 201, 238,and 275 are preferred locations for insertion of and IPBD.

Other OSPs that could be used include E. coli BtuB, FepA, FhuA, IutA,FecA, and FhuE (GUDM89) which are receptors for nutrients usually foundin low abundance. The genes of all these proteins have been sequenced,but topological models are not yet available. Gudmunsdottir et al.(GUDM89) have begun the construction of such a model for BtuB and FepAby showing that certain residues of BtuB face the periplasm and bydetermining the functionality of various BtuB::FepA fusions. Carmel etal. (CARM90) have reported work of a similar nature for FhuA. AllNeisseria species express outer surface proteins for iron transport thathave been identified and, in many cases, cloned. See also MORS87 andMORS88.

Many gram-negative bacteria express one or more phospholipases. E. coliphospholipase A, product of the pldA gene, has been cloned and sequencedby de Geus et al. (DEGE84). They found that the protein appears at thecell surface without any posttranslational processing. A ipbd genefragment can be attached at either terminus or inserted at positionspredicted to encode loops in the protein. That phospholipase A arriveson the outer surface without removal of a signal sequence does not provethat a PldA::IPBD fusion protein will also follow this route. Thus wemight cause a PldA::IPBD or IPBD::PldA fusion to be secreted into theperiplasm by addition of an appropriate signal sequence. Thus, inaddition to simple binary fusion of an ipbd fragment to one terminus ofpldA, the constructions:

1) ss::ipbd::pldA

2) ss::pldA::ipbd

should be tested. Once the PldA::IPBD protein is free in the periplasmit does not remember how it got there and the structural features ofPldA that cause it to localize on the outer surface will direct thefusion to the same destination.

IV.D. Bacterial Spores as Genetic Packages

Bacterial spores have desirable properties as GP candidates. Spores aremuch more resistant than vegetative bacterial cells or phage to chemicaland physical agents, and hence permit the use of a great variety ofaffinity selection conditions. Also, Bacillus spores neither activelymetabolize nor alter the proteins on their surface. Spores have thedisadvantage that the molecular mechanisms that trigger sporulation areless well worked out than is the formation of M13 or the export ofprotein to the outer membrane of E. coli.

Bacteria of the genus Bacillus form endospores that are extremelyresistant to damage by heat, radiation, desiccation, and toxic chemicals(reviewed by Losick et al. (LOSI86)). This phenomenon is attributed toextensive intermolecular crosslinking of the coat proteins. Endosporesfrom the genus Bacillus are more stable than are exospores fromStreptomyces. Bacillus subtilis forms spores in 4 to 6 hours, butStreptomyces species may require days or weeks to sporulate. Inaddition, genetic knowledge and manipulation is much more developed forB. subtilis than for other spore-forming bacteria. Thus Bacillus sporesare preferred over Streptomyces spores. Bacteria of the genusClostridium also form very durable endospores, but clostridia, beingstrict anaerobes, are not convenient to culture.

Viable spores that differ only slightly from wild-type are produced inB. subtilis even if any one of four coat proteins is missing (DON087).Moreover, plasmid DNA is commonly included in spores, and plasmidencoded proteins have been observed on the surface of Bacillus spores(DEBR86). For these reasons, we expect that it will be possible toexpress during sporulation a gene encoding a chimeric coat protein,without interfering materially with spore formation.

Donovan et al. have identified several polypeptide components of B.subtilis spore coat (DON087); the sequences of two complete coatproteins and amino-terminal fragments of two others have beendetermined. Some, but not all, of the coat proteins are synthesized asprecursors and are then processed by specific proteases beforedeposition in the spore coat (DON087). The 12 kd coat protein, CotD,contains 5 cysteines. CotD also contains an unusually high number ofhistidines (16) and prolines (7). The 11 kd coat protein, CotC, containsonly one cysteine and one methionine. CotC has a very unusual amino-acidsequence with 19 lysines (K) appearing as 9 K-K dipeptides and oneisolated K. There are also 20 tyrosines (Y) of which 10 appear as 5 Y-Ydipeptides. Peptides rich in Y and K are known to become crosslinked inoxidizing environments (DEV078, WAIT83, WAIT85, WAIT86). CotC contains16 D and E amino acids that nearly equals the 19 Ks. There are no A, F,R, I, L, N, P, Q, S, or W amino acids in CotC. Neither CotC nor CotD ispost-translationally cleaved, but the proteins CotA and CotB are.

Since, in B. subtilis, some of the spore coat proteins arepost-translationally processed by specific proteases, it is valuable toknow the sequences of precursors and mature coat proteins so that we canavoid incorporating the recognition sequence of the specific proteaseinto our construction of an OSP-IPBD fusion. The sequence of a maturespore coat protein contains information that causes the protein to bedeposited in the spore coat; thus gene fusions that include some or allof a mature coat protein sequence are preferred for screening orselection for the display-of-IPBD phenotype.

Fusions of ipbd fragments to cotC or cotD fragments are likely to causeIPBD to appear on the spore surface. The genes cotC and cotD arepreferred osp genes because CotC and CotD are not post-translationallycleaved. Subsequences from cotA or cotB could also be used to cause anIPBD to appear on the surface of B. subtilis spores, but we must takethe post-translational cleavage of these proteins into account. DNAencoding IPBD could be fused to a fragment of cotA or cotB at either endof the coding region or at sites interior to the coding region. Sporescould then be screened or selected for the display-of-IPBD phenotype.

The promoter of a spore coat protein is most active: a) when spore coatprotein is being synthesized and deposited onto the spore and b) in thespecific place that spore coat proteins are being made. The sequences ofseveral sporulation promoters are known; coding sequences operativelylinked to such promoters are expressed only during sporulation. Ray etal. (RAYC87) have shown that the G4 promoter of B. subtilis is directlycontrolled by RNA polymerase bound to σ^(E). To date, no Bacillussporulation promoter has been shown to be inducible by an exogenouschemical inducer as the lac promoter of E. coli. Nevertheless, thequantity of protein produced from a sporulation promoter can becontrolled by other factors, such as the DNA sequence around theShine-Dalgarno sequence or codon usage. Chemically inducible sporulationpromoters can be developed if necessary.

IV.E. Artificial OSPs

It is generally preferable to use as the genetic package a cell, sporeor virus for which an outer surface protein which can be engineered todisplay a IPBD has already been identified. However, the presentinvention is not limited to such genetic packages.

It is believed that the conditions for an outer surface transport signalin a bacterial cell or spore are not particularly stringent, i.e., arandom polypeptide of appropriate length (preferably 30-100 amino acids)has a reasonable chance of providing such a signal. Thus, byconstructing a chimeric gene comprising a segment encoding the IPBDlinked to a segment of random or pseudorandom DNA (the potential OSTS),and placing this gene under control of a suitable promoter, there is apossibility that the chimeric protein so encoded will function as anOSP-IPBD.

This possibility is greatly enhanced by constructing numerous suchgenes, each having a different potential OSTS, cloning them into asuitable host, and selecting for transformants bearing the IPBD (orother marker) on their outer surface. Use of secretion-permissivemutants, such as prlA4 (LISS85) or prlG (STAD89), can increase theprobability of obtaining a working OSP-IPBD.

When seeking to display a IPBD on the surface of a bacterial cell, as analternative to choosing a natural OSP and an insertion site in the OSP,we can construct a gene (the "display probe") comprising: a) aregulatable promoter (e.g. lacUV5), b) a Shine-Dalgarno sequence, c) aperiplasmic transport signal sequence, d) a fusion of the ipbd gene witha segment of random DNA (as in Kaiser et al. (KAIS87)), e) a stop codon,and f) a transcriptional terminator.

When the genetic package is a spore, we can use the approach describedabove for attaching a IPBD to an E. coli cell, except that: a) asporulation promoter is used, and b) no periplasmic signal sequenceshould be present.

For phage, because the OSP-IPBD fulfills a structural role in the phagecoat, it is unlikely that any particular random DNA sequence coupled tothe ipbd gene will produce a fusion protein that fits into the coat in afunctional way. Nevertheless, random DNA inserted between largefragments of a coat protein gene and the pbd gene will produce apopulation that is likely to contain one or more members that displaythe IPBD on the outside of a viable phage.

As previously stated, the purpose of the random DNA is to encode anOSTS, like that embodied in known OSPs. The fusion of ipbd and therandom DNA could be in either order, but ipbd upstream is slightlypreferred. Isolates from the population generated in this way can bescreened for display of the IPBD. Preferably, a version ofselection-through-binding is used to select GPs that display IPBD on theGP surface. Alternatively, clonal isolates of GPs may be screened forthe display-of-IPBD phenotype.

The preference for ipbd upstream of the random DNA arises fromconsideration of the manner in which the successful GP(IPBD) will beused. The present invention contemplates introducing numerous mutationsinto the pbd region of the osp-pbd gene, which, depending on thevariegation scheme, might include gratuitous stop condons. If pbdprecedes the random DNA, then gratuitous stop condons in pbd lead to noOSP-PBD protein appearing on the cell surface. If pbd follows the randomDNA, then gratuitous stop condons in pbd might lead to incompleteOSP-PBD proteins appearing on the cell surface. Incomplete proteinsoften are non-specifically sticky so that GPs displaying incomplete PBDsare easily removed from the population.

The random DNA may be obtained in a variety of ways. Degeneratesynthetic DNA is one possibility. Alternatively, pseudorandom DNA can begenerated from an DNA having high sequence diversity, e.g., the genomeof the organism, by partially digesting with an enzyme that cuts veryoften, e.g., Sau3AI. Alternatively, one could shear DNA having highsequence diversity, blunt the sheared DNA with the large fragment of E.coli DNA polymerase I (hereinafter referred to as Klenow fragment), andclone the sheared and blunted DNA into blunt sites of the vector(MANI82, p295, AUSU87).

If random DNA and phenotypic selection or screening are used to obtain aGP(IPBD), then we clone random DNA into one of the restriction sitesthat was designed into the display probe. A plasmid carrying the displayprobe is digested with the appropriate restriction enzyme and thefragmented, random DNA is annealed and ligated by standard methods. Theligated plasmids are used to transform cells that are grown and selectedfor expression of the antibiotic-resistance gene. Plasmid-bearing GPsare then selected for the display-of-IPBD phenotype by the affinityselection methods described hereafter, using AfM(IPBD) as if it were thetarget.

As an alternative to selecting GP(IPBD)s through binding to an affinitycolumn, we can isolate colonies or plaques and screen for successfulartificial OSPs through use of one of the methods listed below forverification of the display strategy.

IV.F Designing the Osp-ipbd Gene Insert Genetic Construction andExpression Considerations

The (i)pbd-osp gene may be: a) completely synthetic, b) a composite ofnatural and synthetic DNA, or c) a composite of natural DNA fragments.The important point is that the pbd segment be easily variegated so asto encode a multitudinous and diverse family of PBDs as previouslydescribed. A synthetic ipbd segment is preferred because it allowsgreatest control over placement of restriction sites. Primerscomplementary to regions abutting the osp-ipbd gene on its 3' flank andto parts of the osp-ipbd gene that are not to be varied are needed forsequencing.

The sequences of regulatory parts of the gene are taken from thesequences of natural regulatory elements: a) promoters, b)Shine-Dalgarno sequences, and c) transcriptional terminators. Regulatoryelements could also be designed from knowledge of consensus sequences ofnatural regulatory regions. The sequences of these regulatory elementsare connected to the coding regions; restriction sites are also insertedin or adjacent to the regulatory regions to allow convenientmanipulation.

The essential function of the affinity separation is to separate GPsthat bear PBDs (derived from IPBD) having high affinity for the targetfrom GPs bearing PBDs having low affinity for the target. If the elutionvolume of a GP depends on the number of PBDs on the GP surface, then aGP bearing many PBDs with low affinity, GP(PBD_(w)), might co-elute witha GP bearing fewer PBDs with high affinity, GP(PBDs). Regulation of theosp-pbd gene preferably is such that most packages display sufficientPBD to effect a good separation according to affinity. Use of aregulatable promoter to control the level of expression of the osp-pbdallows fine adjustment of the chromatographic behavior of the variegatedpopulation.

Induction of synthesis of engineered genes in vegetative bacterial cellshas been exercised through the use of regulated promoters such aslacUV5, trpP, or tac (MANI82). The factors that regulate the quantity ofprotein synthesized include: a) promoter strength (cf. HOOP87), b) rateof initiation of translation (cf. GOLD87), c) codon usage, d) secondarystructure of mRNA, including attenuators (cf. LAND87) and terminators(cf. YAGE87), e) interaction of proteins with mRNA (cf. MCPH86, MILL87b,WINT87), f) degradation rates of mRNA (cf. BRAW87, KING86), g)proteolysis (cf. GOTT87). These factors are sufficiently well understoodthat a wide variety of heterologous proteins can now be produced in E.coli, B. subtilis and other host cells in at least moderate quantities(SKER88, BETT88). Preferably, the promoter for the osp-ipbd gene issubject to regulation by a small chemical inducer. For example, the lacpromoter and the hybrid trp-lac (tac) promoter are regulatable withisopropyl thiogalactoside (IPTG). Hereinafter, we use "XINDUCE" as ageneric term for a chemical that induces expression of a gene. Thepromoter for the constructed gene need not come from a natural osp gene;any regulatable bacterial promoter can be used.

Transcriptional regulation of gene expression is best understood andmost effective, so we focus our attention on the promoter. Iftranscription of the osp-ipbd gene is controlled by the chemicalXINDUCE, then the number of OSP-IPBDs per GP increases for increasingconcentrations of XINDUCE until a fall-off in the number of viablepackages is observed or until sufficient IPBD is observed on the surfaceof harvested GP(IPBD)s. The attributes that affect the maximum number ofOSP-IPBDs per GP are primarily structural in nature. There may be sterichindrance or other unwanted interactions between IPBDs if OSP-IPBD issubstituted for every wild-type OSP. Excessive levels of OSP-IPBD mayalso adversely affect the solubility or morphogenesis of the GP. Forcellular and viral GPs, as few as five copies of a protein havingaffinity for another immobilized molecule have resulted in successfulaffinity separations (FERE82a, FERE82b, and SMIT85).

A non-leaky promoter is preferred. Non-leakiness is useful: a) to showthat affinity of GP(osp-ipbd)s for AfM(IPBD) is due to the osp-ipbdgene, and b) to allow growth of GP(osp-ipbd) in the absence of XINDUCEif the expression of osp-ipbd is disadvantageous. The lacUV5 promoter inconjunction with the LacI^(q) repressor is a preferred example.

An exemplary osp-ipbd gene has the DNA sequence shown in Table 25 andthere annotated to explain the useful restriction sites and biologicallyimportant features, viz. the lacUV5 promoter, the lacO operator, theShine-Dalgarno sequence, the amino acid sequence, the stop codons, andthe trp attenuator transcriptional terminator.

The present invention is not limited to a single method of gene design.The osp-ipbd gene need not be synthesized in toto; parts of the gene maybe obtained from nature. One may use any genetic engineering method toproduce the correct gene fusion, so long as one can easily andaccurately direct mutations to specific sites in the pbd DNAsubsequence. In all of the methods of mutagenesis considered in thepresent invention, however, it is necessary that the coding sequence forthe osp-ipbd gene be different from any other DNA in the OCV. The degreeand nature of difference needed is determined by the method ofmutagenesis to be used. If the method of mutagenesis is to bereplacement of subsequences coding for the PBD with vgDNA, then thesubsequences to be mutagenized are preferably bounded by restrictionsites that are unique with respect to the rest of the OCV. Use ofnon-unique sites involves partial digestion which is less efficient thancomplete digestion of a unique site and is not preferred. Ifsingle-stranded-oligonucleotide-directed mutagenesis is to be used, thenthe DNA sequence of the subsequence coding for the IPBD must be uniquewith respect to the rest of the OCV.

The coding portions of genes to be synthesized are designed at theprotein level and then encoded in DNA. The amino acid sequences arechosen to achieve various goals, including: a) display of a IPBD on thesurface of a GP, b) change of charge on a IPBD, and c) generation of apopulation of PBDs from which to select an SBD. These issues are discussin more detail below. The ambiguity in the genetic code is exploited toallow optimal placement of restriction sites and to create variousdistributions of amino acids at variegated codons.

While the invention does not require any particular number or placementof restriction sites, it is generally preferable to engineer restrictionsites into the gene to facilitate subsequent manipulations. Preferably,the gene provides a series of fairly uniformly spaced unique restrictionsites with no more than a preset maximum number of bases, for example100, between sites. Preferably, the gene is designed so that itsinsertion into the OCV does not destroy the uniqueness of uniquerestriction sites of the OCV. Preferred recognition sites are those forrestriction enzymes which a) generate cohesive ends, b) have unambiguousrecognition, or c) have higher specific activity.

The ambiguity of the DNA between the restriction sties is resolved fromthe following considerations. If the given amino acid sequence occurs inthe recipient organism, and if the DNA sequence of the gene in theorganism is know, then, preferably, we maximize the differences betweenthe engineered and natural genes to minimize the potential forrecombination. In addition, the following codons are poorly translatedin E. coli and, therefore, are avoided if possible: cta(L), cga (R), cgg(R), and agg (R). For other host species, different codon restrictionswould be appropriate. Finally, long repeats of any one base are prone tomutation and thus are avoided. Balancing these considerations, we candesign a DNA sequence.

Structural Considerations

The design of the amino-acid sequence for the ipbd-osp gene to encodeinvolves a number of structural considerations. The design is somewhatdifferent for each type of GP. In bacteria, OSPs are not essential, sothere is no requirement that the OSP domain of a fusion have any of itsparental functions beyond lodging in the outer membrane.

Relationship between PBD and OSP

It is not required that the PBD and OSP domains have any particularspatial relationship; hence the process of this invention does notrequire use of the method of US Patent '692.

It is, in fact, desirable that the OSP not constrain the orientation ofthe PBD domain; this is not to be confused with lack of constraintwithin the PBD. Cwirla et al. (CWIR90), Scott and Smith (SCOT90), andDevlin et al., (DEVL90), have taught that variable residues inphage-displayed random peptides should be free of influence from thephase OSP. We teach that binding domains having a moderate to highdegree of conformational constraint will exhibit higher specificity andthat higher affinity is also possible. Thus, we prescribe picking codonsfor variegation that specify amino acids that will appear in awell-defined framework. The nature of the side groups is varied througha very wide range due to the combinatorial replacement of multiple aminoacids. The main chain conformations of most PBDs of a given class isvery similar. The movement of the PBD relative to the OSP should not,however, be restricted. Thus, it is often appropriate to include aflexible linker between the PBD and the OSP. Such flexible linkers canbe taken from naturally occurring proteins known to have flexibleregions. For example, the gIII protein of M13 contains glycine-richregions thought to allow the amino-terminal domains a high degree offreedom. Such flexible linkers may also be designed. Segments ofpolypeptides that are rich in the amino acids GLY, ASN, SER, and ASP arelikely to give rise to flexibility. Multiple glycines are particularlypreferred.

Constraints Imposed by OSP

When we choose to insert the PBD into a surface loop of an OSP such asLamB, OmpA, or M13 gIII protein, there are a few considerations that donot arise when PBD is joined to the end of an OSP. In these cases, theOSP exerts some constraining influence on the PBD; the ends of the PBDare held in more or less fixed positions. We could insert a highlyvaried DNA sequence into the osp gene at codons that encode asurface-exposed loop and select for cells that have a specific-bindingphenotype. When the identified amino-acid sequence is synthesized (byany means), the constraint of the OSP is lost and the peptide is likelyto have a much lower affinity for the target and a much lowerspecificity. Tan and Kaiser (TANN77) found that a synthetic model ofBPTI containing all the amino acids of BPTI that contact trypsin has aK_(d) for trypsin ≈10⁷ higher than BPTI. Thus, it is strongly preferredthat the varied amino acids be part of a PBD in which the structuralconstrains are supplied by the PBD.

It is known that the amino acids adjoining foreign epitopes insertedinto LamB influence the immunological properties of these epitopes(VAND90). We expect that PBDs inserted into loops of LamB, OmpA, orsimilar OSPs will be influenced by the amino acids of the loop and bythe OSP in general. To obtain appropriate display of the PBD, it may benecessary to add one or more linker amino acids between the OSP and thePBD. Such linkers may be taken from natural proteins or designed on thebasis of our knowledge of the structural behavior of amino acids.Sequences rich in GLY, SER, ASN, ASP, ARG, and THR are appropriate. Oneto five amino acids at either junction are likely to impart the desireddegree of flexibility between the OSP and the PBD.

Phage OSP

A preferred site for insertion of the ipbd gene into the phage osp geneis one in which: a) the IPBD folds into its original shape, b) the OSPdomains fold into their original shapes, and c) there is no interferencebetween the two domains.

If there is a model of the phage that indicates that either the amino orcarboxy terminus of an OSP is exposed to solvent, then the exposedterminus of that mature OSP becomes the prime candidate for insertion ofthe ipbd gene. A low resolution 3 D model suffices.

In the absence of a 3 D structure, the amino and carboxy termini of themature OSP are the best candidates for insertion of the ipbd gene. Afunctional fusion may require additional residues between the IPBD andOSP domains to avoid unwanted interactions between the domains.Random-sequence DNA or DNA coding for a specific sequence of a proteinhomologous to the IPBD or OSP, can be inserted between the osp fragmentand the ipbd fragment if needed.

Fusion at a domain boundary within the OSP is also a good approach forobtaining a functional fusion. Smith exploited such a boundary whensubcloning heterologous DNA into gene III of f1 (SMIT85).

The criteria for identifying OSP domains suitable for causing display ofan IPBD are somewhat different from those used to identify and IPBD.When identifying an OSP, minimal size is not so important because theOSP domain will not appear in the final binding molecule nor will weneed to synthesize the gene repeatedly in each variegation round. Themajor design concerns are that: a) the OSP::IPBD fusion causes displayof IPBD, b) the initial genetic construction be reasonably convenient,and c) the osp::ipbd gene be genetically stable and easily manipulated.There are several methods of identifying domains. Methods that rely onatomic coordinates have been reviewed by Janin and Chothia (JANI85).These methods use matrices of distances between α carbons (C.sub.α),dividing planes (cf. ROSE85), or buried surface (RASH84). Chothia andcollaborators have correlated the behavior of many natural proteins withdomain structure (according to their definition). Rashin correctlypredicted the stability of a domain comprising residues 206-316 ofthermolysin (VITA84, RASH84).

Many researchers have used partial proteolysis and protein sequenceanalysis to isolate and identify stable domains. (See, for example,VITA84, POTE83, SCOT87a, and PAB079.) Pabo et al. used calorimetry as anindicator that the cI repressor from the coliphage λ contains twodomains; they then used partial proteolysis to determine the location ofthe domain boundary.

If the only structural information available is the amino acid sequenceof the candidate OSP, we can use the sequence to predict turns andloops. There is a high probability that some of the loops and turns willbe correctly predicted (cf. Chou and Fasman, (CHOU74)); these locationsare also candidates for insertion of the ipbd gene fragment.

Bacterial OSPs

In bacterial OSPs, the major considerations are: a) that the PBD isdisplayed, and b) that the chimeric protein not be toxic.

From topological models of OSPs, we can determine whether the amino orcarboxy termini of the OSP is exposed. If so, then these are excellentchoices for fusion of the osp fragment to the ipbd fragment.

The lamB gene has been sequenced and is available on a variety ofplasmids (CLEM81, CHAR88). Numerous fusions of fragments of lamB with avariety of other genes have been used to study export of proteins in E.coli. From various studies, Charbit et al. (CHAR88) have proposed amodel that specifies which residues of LamB are: a) embedded in themembrane, b) facing the periplasm, and c) facing the cell surface; weadopt the numbering of this model for amino acids in the mature protein.According to this model, several loops on the outer surface are defined,including: 1) residues 88 through 111, 2) residues 145 through 165, and3) 236 through 251.

Consider a mini-protein embedded in LamB. For example, insertion of DNAencoding G₁ NXCX₅ XXXCX₁₀ SG₁₂ (SEQ ID NO: 8) between codons 153 and 154of lamB is likely to lead to a wide variety of LamB derivatives beingexpressed on the surface of E. coli cells. G₁, N₂, S₁₁, and G₁₂ aresupplied to allow the mini-protein sufficient orientational freedom thatis can interact optimally with the target. Using affinity enrichment(involving, for example, FACS via a fluorescently labeled target,perhaps through several rounds of enrichment), we might obtain a strain(named, for example, BEST) that expresses a particular LamB derivativethat shows high affinity for the predetermined target. An octapeptidehaving the sequence of the inserted residues 3 through 10 from BEST islikely to have an affinity and specificity similar to that observed inBEST because the octapeptide has an internal structure that keeps theamino acids in a conformation that is quite similar in the LamBderivative and in the isolated mini-protein.

Consideration of the Signal Peptide

Fusing one or more new domains to a protein may make the ability of thenew protein to be exported from the cell different from the ability ofthe parental protein. The signal peptide of the wild-type coat proteinmay function for authentic polypeptide but be unable to direct export ofa fusion. To utilize the Sec-dependent pathway, one may need a differentsignal peptide. Thus, to express and display a chimeric BPTI/M13 geneVIII protein, we found it necessary to utilize a heterologous signalpeptide (that of phoA).

Provision of a Means to Remove PBD from the GP

GPs that display peptides having high affinity for the target may bequite difficult to elute from the target, particularly a multivalenttarget. (Bacteria that are bound very tightly can simply multiply insitu.) For phage, one can introduce a cleavage site for a specificprotease, such as blood-clotting Factor Xa, into the fusion OSP proteinso that the binding domain can be cleaved from the genetic package. Suchcleavage has the advantage that all resulting phage have identical OSPsand therefore are equally infective, even if polypeptide-displayingphage can be eluted from the affinity matrix without cleavage. This stepallows recovery of valuable genes which might otherwise be lost. To ourknowledge, no one has disclosed or suggested using a specific proteaseas a means to recover an information-containing genetic package or ofconverting a population of phage that vary in infectivity into phagehaving identical infectivity.

IV.G. Synthesis of Gene Inserts

The present invention is not limited as to how a designed DNA sequenceis divided for easy synthesis. An established method is to synthesizeboth strands of the entire gene in overlapping segments of 20 to 50nucleotides (nts) (THER88). An alternative method that is more suitablefor synthesis of vgDNA is an adaptation of methods published by Oliphantet al. (OLIP86 and OLIP87) and Ausubel et al. (AUSU87). It differs fromprevious methods in that it: a) uses two synthetic strands, and b) doesnot cut the extended DNA in the middle. Our goals are: a) to producelonger pieces of dsDNA than can be synthesized as ssDNA on commercialDNA synthesizers, and b) to produce strands complementary tosingle-stranded vgDNA. By using two synthetic strands, we remove therequirement for a palindromic sequence at the 3' end.

DNA synthesizers can currently produce oligo-nts of lengths up to 200nts in reasonable yield, M_(DNA) =200. The parameters N_(w) (the lengthof overlap needed to obtain efficient annealing) and N_(s) (the numberof spacer bases needed so that a restriction enzyme can cut near the endof blunt-ended dsDNA) are determined by DNA and enzyme chemistry. N_(w)=10 and N_(s) =5 are reasonable values. Larger values of N_(w) and N_(s)are allowed but add to the length of ssDNA that is to be synthesized andreduce the net length of dsDNA that can be produced.

Let A_(L) be the actual length of dsDNA to be synthesized, including anyspacers. A_(L) must be no greater than (2 M_(DNA) -N_(w)). Let Q_(w) bethe number of nts that the overlap window can deviate from center,

    Q.sub.w =(2 M.sub.DNA -N.sub.w -A.sub.L)/2 .

Q_(w) is never negative. It is preferred that the two fragments beapproximately the same length so that the amounts synthesized will beapproximately equal. This preference may be overridden by otherconsiderations. The overall yield of dsDNA is usually dominated by thesynthetic yield of the longer oligo-nt.

We use the following procedure to generate dsDNA of lengths up to (2M_(DNA) -N_(w)) nts through the use of Klenow fragment to extendsynthetic ss DNA fragments that are not more than M_(DNA) nts long. Whena pair of long oligo-nts, complementary for N_(w) nts at their 3' ends,are annealed there will be a free 3' hydroxyl and a long ssDNA chaincontinuing in the 5' direction on either side. We will L refer to thissituation as a 5' superoverhang. The procedure comprises:

1) picking a non-palindromic subsequence of N_(w) to N_(w) +4 nts nearthe center of the dsDNA to be synthesized; this region is called theoverlap (typically, N_(w) is 10),

2) synthesizing a ss DNA molecule that comprises that part of theanti-sense strand from its 5' end up to and including the overlap,

3) synthesizing a ss DNA molecule that comprises that part of the sensestrand from its 5' end up to and including the overlap,

4) annealing the two synthetic strands that are complementary throughoutthe overlap region, and

5) extending both superoverhangs with Klenow fragment and all fourdeoxynucleotide triphosphates.

Because M_(DNA) is not rigidly fixed at 200, the current limits of 390(=2 M_(DNA) -N_(w)) nts overall and 200 in each fragment are not rigid,but can be exceeded by 5 or 10 nts. Going beyond the limits of 390 and200 will lead to lower yields, but these may be acceptable in certaincases.

Restriction enzymes do not cut well at sites closer than about five basepairs from the end of blunt ds DNA fragments (OLIP87 and p.132 NewEngland BioLabs 1990-1991 Catalogue). Therefore N_(s) nts (with N_(s)typically set to 5) of spacer are added to ends that we intend to cutwith a restriction enzyme. If the plasmid is to be cut with ablunt-cutting enzyme, then we do not add any spacer to the correspondingend of the ds DNA fragment.

To choose the optimum site of overlap for the oligo-nt fragments, firstconsider the anti-sense strand of the DNA to be synthesized, includingany spacers at the ends, written (in upper case) from 5' to 3' andleft-to-right. N.B.: The N_(w) nt long overlap window can never includebases that are to be variegates. N.B.: The N_(w) nt long overlap shouldnot be palindromic lest single DNA molecules prime themselves. Place aN_(w) nt long window as close to the center of the anti-sense sequenceas possible. Check to see whether one or more codons within the windowcan be changed to increase the GC content without: a) destroying aneeded restriction site, b) changing amino acid sequence, or c) makingthe overlap region palindromic. If possible, change some AT base pairsto GC pairs. If the GC content of the window is less than 50%, slide thewindow right or left as much as Q_(w) nuts to maximize the number of C'sand G's inside the window, but without including any variegated bases.For each trail setting of the overlap window, maximize, the GC contentby silent codon changes, but do not destroy wanted restriction sites ormake the overlap palindromic. If the best setting still has less than50% GC, enlarge the window to N₂ +2 nts and place it within five nts ofthe center to obtain the maximum GC content. If enlarging the window oneor two nts will increase the GC content, do so, but do not includevariegated bases.

Underscore the anti-sense strand from the 5' end up to the right edge ofthe window. Write the complementary sense sequence 3'-to-5' andleft-to-=right and in lower case letters, under the anti-sense strandstarting at the left edge of the window and continuing all the way tothe right end of the anti-sense strand.

We will synthesize the underscored anti-sense strand and the part of thesense strand that we wrote. These two fragments, complementary over thelength of the window of high GC content, are mixed in equimolarquantities and annealed. These fragments are extended with Klenowfragment and all four deoxynucleotide triphosphates to produce dsblunt-ended DNA. This DNA can be cut with appropriate restrictionenzymes to produce the cohesive ends needed to ligate the fragment toother DNA.

The present invention is not limited to any particular method of DNAsynthesis or construction. Conventional DNA synthesizers may be used,with appropriate reagent modifications for production of variegated DNA(similar to that now used for production of mixed probes). For example,the Milligen 7500 DNA synthesizer has seven vials from whichphosphoramidites may be taken. Normally, the first four contain A, C, T,and G. The other three vials may contain unusual bases such as inosineor mixtures of bases, the so-called "dirty bottle". The standardsoftware allows programmed mixing of two, three, or four bases inequimolar quantities.

The synthesized DNA may be purified by any art recognized technique,e.g., by high-pressure liquid chromatography (HPLC) or PAGE.

The osp-pbd genes may be created by inserting vgDNA into an existingparental gene, such as the osp-ipbd shown to be displayable by asuitably transformed GP. The present invention is not limited to anyparticular method of introducing the vgDNA, however, two techniques arediscussed below.

In the case of cassette mutagenesis, the restriction sites that wereintroduced when the gene for the inserted domain was synthesized areused to introduce the synthetic vgDNA into a plasmid or other OCV.Restriction digestions and ligations are performed by standard methods(AUSU87).

In the case of single-stranded-oligonucleotide-directed mutagenesis,synthetic vgDNA is used to create diversity in the vector (BOTS85).

The modes of creating diversity in the population of GPs discussedherein are not the only modes possible. Any method of mutagenesis thatpreserves at least a large fraction of the information obtained from oneselection and then introduces other mutations in the same domain willwork. The limiting factors are the number of independent transformantsthat can be produced and the amount of enrichment one can achievethrough affinity separation. Therefore the preferred embodiment uses amethod of mutagenesis that focuses mutations into those residues thatare most likely to affect the binding properties of the PBD and areleast likely to destroy the underlying structure of the IPBD.

Other modes of mutagenesis might allow other GPs to be considered. Forexample, the bacteriophage λ is not a useful cloning vehicle forcassette mutagenesis because of the plethora of restriction sites. Onecan, however, use single-stranded-oligo-nt-directed mutagenesis on λwithout the need for unique restriction sites. No one has usedsingle-stranded-oligo-nt-directed mutagenesis to introduce the highlevel of diversity called for in the present invention, but if it ispossible, such a method would allow use of phage with large genomes.

IV.H. Operative Cloning Vector

The operative cloning vector (OCV) is a replicable nucleic acid used tointroduce the chimeric ipbd-osp or ipbd-osp gene into the geneticpackage. When the genetic package is a virus, it may serve as its ownOCV. For cells and spores, the OCV may be a plasmid, a virus, aphagemid, or a chromosome.

The OCV is preferably small (less than 10 KB), stable (even afterinsertion of at least 1 kb DNA), present in multiple copies within thehost cell, and selectable with appropriate media. It is desirable thatcassette mutagenesis be practical in the OCV; preferably, at least 25restriction enzymes are available that do not cut the OCV. It islikewise desirable that single-stranded mutagenesis be practical. If asuitable OCV does not already exist, it may be engineered bymanipulation of available vectors.

When the GP is a bacterial cell or spore, the OCV is preferably aplasmid because genes on plasmids are much more easily constructed andmutated than are genes in the bacterial chromosome. When bacteriophageare to be used, the osp-ipbd gene is inserted into the phage genome. Thesynthetic osp-ipbd genes can be constructed in small vectors andtransferred to the GP genome when complete.

Phage such as M13 do not confer antibiotic resistance on the host sothat one can not select for cells infected with M13. An antibioticresistance gene can be engineered into the M13 genome (HINE80). Morevirulent phage, such as ΦX174, make discernable plaques that can bepicked, in which case a resistance gene is not essential; furthermore,there is no room in the ΦX174 virion to add any new genetic material.Inability to include an antibiotic resistance gene is a disadvantagebecause it limits the number of GPs that can be screened.

It is preferred that GP(IPBD) carry a selectable marker not carried bywtGP. It is also preferred that wtGP carry a selectable marker notcarried by GP(IPBD).

A derivative of M13 is the most preferred OCV when the phage also servesas the GP. Wild-type M13 does not confer any resistances on infectedcells; M13 is a pure parasite. A "phagemid" is a hybrid between a phageand a plasmid, and is used in this invention. Double-stranded plasmidDNA isolated from phagemid-bearing cells is denoted by the standardconvention, e.g. pXY24. Phage prepared from these cells would bedesignated XY24. Phagemids such as Bluescript K/S (sold by Stratagene)are not preferred for our purposes because Bluescript does not containthe full genome of M13 and must be rescued by coinfection with competentwild-type M13. Such coinfections could lead to genetic recombinationyielding heterogeneous phage unsuitable for the purposes of the presentinvention. Phagemids may be entirely suitable for developing a gene thatcauses an IPBD to appear on the surface of phage-like genetic packages.

It is also well known that plasmids containing the colE1 origin ofreplication can be greatly amplified if protein synthesis is halted in alog-phase culture. Protein synthesis can be halted by addition ofchloramphenicol or other agents (MANI82).

The bacteriophage M13 bla 61 (ATCC 37039) is derived from wild-type M13through the insertion of the β lactamase gene (HINE80). This phagecontains 8.13 kb of DNA. M13 bla cat 1 (ATCC 37040) is derived from M13bla 61 through the additional insertion of the chloramphenicolresistance gene (HINE80); M13 bla cat 1 contains 9.88 kb of DNA.Although neither of these variants of M13 contains the ColE1 origin ofreplication, either could be used as a starting point to construct acloning vector with this feature.

IV.I. Transformation of Cells

When the GP is a cell, the population of GPs is created by transformingthe cells with suitable OCVs. When the GP is a phage, the phage aregenetically engineered and then transfected into host cells suitable foramplification. When the GP is a spore, cells capable of sporulation aretransformed with the OCV while in a normal metabolic state, and thensporulation is induced so as to cause the OSP-PBDs to be displayed. Thepresent invention is not limited to any one method of transforming cellswith DNA. The procedure given in the examples is a modification of thatof Maniatis (p250, MANI82). One preferably obtains at least 10⁷ and morepreferably at least 10⁸ transformants/μg of CCC DNA.

The transformed cells are grown first under non-selective conditionsthat allow expression of plasmid genes and then selected to killuntransformed cells. Transformed cells are then induced to express theosp-pbd gene at the appropriate level of induction. The GPs carrying theIPBD or PBDs are then harvested by methods appropriate to the GP athand, generally, centrifugation to pelletize GPs and resuspension of thepellets in sterile medium (cells) or buffer (spores or phage). They arethen ready for verification that the display strategy was successful(where the GPs all display a "test" IPBD) or for affinity selection(where the GPs display a variety of different PBDs).

IV.J. Verification of Display Strategy

The harvested packages are tested to determine whether the IPBD ispresent on the surface. In any tests of GPs for the presence of IPBD onthe GP surface, any ions or cofactors known to be essential for thestability of IPBD or AfM(IPBD) are included at appropriate levels. Thetests can be done: a) by affinity labeling, b) enzymatically, c)spectrophotometrically, d) by affinity separation, or e) by affinityprecipitation. The AfM(IPBD) in this step is one picked to have strongaffinity (preferably, K_(d) <10⁻¹¹ M) for the IPBD molecule and littleor no affinity for the wtGP. For example, if BPTI were the IPBD,trypsin, anhydrotrypsin, or antibodies to BPTI could be used as theAfM(BPTI) to test for the presence of BPTI. Anhydrotrypsin, a trypsinderivative with serine 195 converted to dehydroalanine, has noproteolytic activity but retains its affinity for BPTI (AKOH72 andHUBE77).

Preferably, the presence of the IPBD on the surface of the GP isdemonstrated through the use of a soluble, labeled derivative of aAfM(IPBD) with high affinity for IPBD. The label could be: a) aradioactive atom such as ¹²⁵ I, b) a chemical entity such as biotin, or3) a fluorescent entity such as rhodamine or fluorescein. The labeledderivative of AfM(IPBD) is denoted as AfM(IPBD)*. The preferredprocedure is:

1) mix AfM(IPBD)* with GPs that are to be tested for the presence ofIPBD; conditions of mixing should favor binding of IPBD to AfM(IPBD)*,

2) separate GPs from unbound AfM(IPBD)* by use of:

a) a molecular sizing filter that will pass AfM(IPBD)* but not GPs,

b) centrifugation, or

c) a molecular sizing column (such as Sepharose or Sephadex) thatretains free AfM(IPBD)* but not GPs,

3) quantitate the AfM(IPBD)* bound by GPs.

Alternatively, if the IPBD has a known biochemical activity (enzymaticor inhibitory), its presence on the GP can be verified through thisactivity. For example, if the IPBD were BPTI, then one could use thestoichiometric inactivation of trypsin not only to demonstrate thepresence of BPTI, but also to quantitate the amount.

If the IPBD has strong, characteristic absorption bands in the visibleor UV that are distinct from absorption by the wtGP, then anotheralternative for measuring the IPBD displayed on the GP is aspectrophotometric measurement. For example, if IPBD were azurin, thevisible absorption could be used to identify GPs that display azurin.

Another alternative is to label the GPs and measure the amount of labelretained by immobilized AfM(IPBD). For example, the GPs could be grownwith a radioactive precursor, such as ³² P or ³ H-thymidine, and theradioactivity retained by immobilized AfM(IPBD) measured.

Another alternative is to use affinity chromatography; the ability of aGP bearing the IPBD to bind a matrix that supports a AfM(IPBD) ismeasured by reference to the wtGP.

Another alternative for detecting the presence of IPBD on the GP surfaceis affinity precipitation.

If random DNA has been used, then affinity selection procedures are usedto obtain a clonal isolate that has the display-of-IPBD phenotype.Alternatively, clonal isolates may be screened for the display-of-IPBDphenotype. The tests of this step are applied to one or more of theseclonal isolates.

If no isolates that bind to the affinity molecule are obtained we takecorrective action as disclosed below.

If one or more of the tests above indicates that the IPBD is displayedon the GP surface, we verify that the binding of molecules having knownaffinity for IPBD is due to the chimeric osp-ipbd gene through the useof standard genetic and biochemical techniques, such as:

1) transferring the osp-ipbd gene into the parent GP to verify thatosp-ipbd confers binding,

2) deleting the osp-ipbd gene from the isolated GP to verify that lossof osp-ipbd causes loss of binding,

3) showing that binding of GPs to AfM(IPBD) correlates with [XINDUCE](inthose cases that expression of osp-ipbd is controlled by [XINDUCE]), and

4) showing that binding of GPs to AfM(IPBD) is specific to theimmobilized AfM(IPBD) and not to the support matrix.

Variation of: a) binding of GPs by soluble AfM(IPBD)*, b) absorptioncaused by IPBD, and c) biochemical reactions of IPBD are linear in theamount of IPBD displayed. Presence of IPBD on the GP surface is indi-indicated by a strong correlation between [XINDUCE] and the reactionsthat are linear in the amount of IPBD. Leakiness of the promoter is notlikely to present problems of high background with assays that arelinear in the amount of IPBD. These experiments may be quicker andeasier than the genetic tests. Interpreting the effect of [XINDUCE] onbinding to a {AfM(IPBD)} column, however, may be problematic unless theregulated promoter is completely repressed in the absence of [XINDUCE].The affinity retention of GP(IPBD)s is not linear in the number ofIPBDs/GP and there may be, for example, little phenotypic differencebetween GPs bearing 5 IPBDs and GPs bearing 50 IPBDs. The demonstrationthat binding is to AfM(IPBD) and the genetic tests are essential; thetests with XINDUCE are optional.

We sequence the relevant ipbd gene fragment from each of several clonalisolates to determine the construction. We also establish the maximumsalt concentration and pH range for which the GP(IPBD) binds the chosenAfM(IPBD). This is preferably done by measuring, as a function of saltconcentration and pH, the retention of AfM(IPBD)* on molecular sizingfilters that pass AfM(IPBD)* but not GP. This information will be usedin refining the affinity selection scheme.

IV.K. Analysis and Correction of Display Problems

If the IPBD is displayed on the outside of the GP, and if that displayis clearly caused by the introduced osp-ipbd gene, we proceed withvariegation, otherwise we analyze the result and adopt appropriatecorrective measures. If we have unsuccessfully attempted to fuse an ipbdfragment to a natural osp fragment, our options are :1) pick a differentfusion to the same osp by a) using opposite end of osp, b) keeping moreor fewer residues from osp in the fusion; for example, in increments of3 or 4 residues, c) trying a known or predicted domain boundary, d)trying a predicted loop or turn position, 2) pick a different osp,. or3) switch to random DNA method. If we have just tried the random DNAmethod unsuccessfully, our options are: 1) choose a differentrelationship between ipbd fragment and random DNA (ipbd first, randomDNA second or vice versa), 2) try a different degree of partialdigestion, a different enzyme for partial digestion, a different degreeof shearing or a different source of natural DNA, or 3) switch to thenatural OSP method. If all reasonable OSPs of the current GP have beentried and the random DNA method has been tried, both without success, wepick a new GP.

We may illustrate the ways in which problems may be attacked by usingthe example of BPTI as the IPBD, the M13 phage as the GP, and the majorcoat (gene VIII) protein as the OSP. The following amino-acid sequence,called AA₋₋ seq2, illustrates how the sequence for mature BPTI (shownunderscored) may be inserted immediately after the signal sequence ofM13 precoat protein (indicated by the arrow) and before the sequence forthe M13 CP. ##STR4##

We adopt the convention that sequence numbers of fusion proteins referto the fusion, as coded, unless otherwise noted. Thus the alanine thatbegins M13 CP is referred to as "number 82", "number 1 of M13 CP", or"number 59 of the mature BPTI-M13 CP fusion".

It is desirable to determine where, exactly, the BPTI binding domain isbeing transported: is it remaining in the cytoplasm? Is it free withinthe periplasm? Is it attached to the inner membrane? Proteins in theperiplasm can be freed through spheroplast formation using lysozyme andEDTA in a concentrated sucrose solution (BIRD67, MALA64). If BPTI werefree in the periplasm, it would be found in the supernatant. Trypsinlabeled with ¹²⁵ I would be mixed with supernatant and passed over anon-denaturing molecular sizing column and the radioactive fractionscollected. The radioactive fractions would then be analyzed by SDS-PAGEand examined for BPTI-sized bands by silver staining.

Spheroplast formation exposes proteins anchored in the inner membrane.Spheroplasts would be mixed with AHTrp* and then either filtered orcentrifuged to separate them from unbound AHTrp*. After washing withhypertonic buffer, the spheroplasts would be analyzed for extent ofAHTrp* binding.

If BPTI were found free in the periplasm, then we would expect that thechimeric protein was being cleaved both between BPTI and the M13 maturecoat sequence and between BPTI and the signal sequence. In that case, weshould alter the BPTI/M13 CP junction by inserting vgDNA at codons forresidues 78-82 of AA₋₋ seq2.

If BPTI were found attached to the inner membrane, then two hypothesescan be formed. The first is that the chimeric protein is being cut afterthe signal sequence, but is not being incorporated into LG7 virion; thetreatment would also be to insert vgDNA between residues 78 and 82 ofAA₋₋ seq2. The alternative hypothesis is that BPTI could fold and reactwith trypsin even if signal sequence is not cleaved. N-terminal aminoacid sequencing of trypsin-binding material isolated from cellhomogenate determines what processing is occurring. If signal sequencewere being cleaved, we would use the procedure above to vary residuesbetween C78 and A82; subsequent passes would add residues after residue81. If signal sequence were not being cleaved, we would vary residuesbetween 23 and 27 of AA₋₋ seq2. Subsequent passes through that processwould add residues after 23.

If BPTI were found neither in the periplasm nor on the inner membrane,then we would expect that the fault was in the signal sequence or thesignal-sequence-to-BPTI junction. The treatment in this case would be tovary residues between 23 and 27.

Analytical experiments to determine what has gone wrong take time andeffort and, for the foreseen outcomes, indicate variations in only tworegions. Therefore, we believe it prudent to try the syntheticexperiments described below without doing the analysis. For example,these six experiments that introduce variegation into the bpti-gene VIIIfusion could be tried:

1) 3 variegated codons between residues 78 and 82 using olig#12 andolig#13,

2) 3 variegated codons between residues 23 and 27 using olig#14 andolig#15,

3) 5 variegated codons between residues 78 and 82 using olig#13 andolig#12a,

4) 5 variegated codons between residues 23 and 27 using olig#15 andolig#14a,

5) 7 variegated codons between residues 78 and 82 using olig#13 andolig#12b, and

6) 7 variegated codons between residues 23 and 27 using olig#15 andolig#14b.

To alter the BPTI-M13 CP junction, we introduce DNA variegated at codonsfor residues between 78 and 82 into the SphI and SfiI sites of pLG7. Theresidues after the last cysteine are highly variable in amino acidsequences homologous to BPTI, both in composition and length; in Table25 these residues are denoted as G79, G80, and A81. The first part ofthe M13 CP is denoted as A82, E83, and G84. One of the oligo-ntsolig#12, olig#12a, or olig#12b and the primer olig#13 are synthesized bystandard methods. The oligo-nts are: ##STR5## where F is a mixture of(0.26 T, 0.18 C, 0.26 A, and 0.30 with G), x is a mixture of (0.22 T,0.16 C, 0.40 A, and 0.22 G), and k is a mixture of equal parts of T andG. The bases shown in lower case at either end are spacers and are notincorporated into the cloned gene. The primer is complementary to the 3'end of each of the longer oligo-nts. One of the variegated oligo-nts andthe primer olig#13 are combined in equimolar amounts and annealed. ThedsDNA is completed with all four (nt)TPs and Klenow fragment. Theresulting dsDNA and RF pLG7 are cut with both SfiI and SphI, purified,mixed, and ligated. We then select a transformed clone that, wheninduced with IPTG, binds AHTrp.

To vary the junction between M13 signal sequence and BPTI, we introduceDNA variegated at codons for residues between 23 and 27 into the KpnIand XhoI sites of pLG7. The first three residues are highly variable inamino acid sequences homologous to BPTI. Homologous sequences also varyin length at the amino terminus. One of the oligo-nts olig#14, olig#14a,or olig#14b and the primer olig#15 are synthesized by standard methods.The oligo-nts are: ##STR6## where f is a mixture of (0.26 T, 0.18 C,0.26 A, and 0.30 G), x is a mixture of (0.22 T, 0.16 C, 0.40 A, and 0.22G), and k is a mixture of equal parts of T and G. The bases shown inlower case at either end are spacers and are not incorporated into thecloned gene. One of the variegated oligo-nts and the primer are combinedin equimolar amounts and annealed. The ds DNA is completed with all four(nt)TPs and Klenow fragment. The resulting dsDNA and RF pLG7 are cutwith both KpnI and XhoI, purified, mixed, and ligated. We select atransformed clone that, when induced with IPTG, binds AHTrp or trp.

Other numbers of variegated codons could be used.

If none of these approaches produces a working chimeric protein, we maytry a different signal sequence. If that doesn't work, we may try adifferent OSP.

V. AFFINITY SELECTION OF TARGET-BINDING MUTANTS V.A. Affinity SeparationTechnology, Generally

Affinity separation is used initially in the present invention to verifythat the display system is working, i.e., that a chimeric outer surfaceprotein has been expressed and transported to the surface of the geneticpackage and is oriented so that the inserted binding domain isaccessible to target material. When used for this purpose, the bindingdomain is a known binding domain for a particular target and that targetis the affinity molecule used in the affinity separation process. Forexample, a display system may be validated by using inserting DNAencoding BPTI into a gene encoding an outer surface protein of thegenetic package of interest, and testing for binding to anhydrotrypsin,which is normally bound by BPTI.

If the genetic packages bind to the target, then we have confirmationthat the corresponding binding domain is indeed displayed by the geneticpackage. Packages which display the binding domain (and thereby bind thetarget) are separated from those which do not.

Once the display system is validated, it is possible to use a variegatedpopulation of genetic packages which display a variety of differentpotential binding domains, and use affinity separation technology todetermine how well they bind to one or more targets. This target neednot be one bound by a known binding domain which is parental to thedisplayed binding domains, i.e., one may select for binding to a newtarget.

For example, one may variegate a BPTI binding domain and test forbinding, not to trypsin, but to another serine protease, such as humanneutrophil elastase or cathepsin G, or even to a wholly unrelatedtarget, such as horse heart myoglobin.

The term "affinity separation means" includes, but is not limited to: a)affinity column chromatography, b) batch elution from an affinity matrixmaterial, c) batch elution from an affinity material attached to aplate, d) fluorescence activated cell sorting, and e) electrophoresis inthe presence of target material. "Affinity material" is used to mean amaterial with affinity for the material to be purified, called the"analyte". In most cases, the association of the affinity material andthe analyte is reversible so that the analyte can be freed from theaffinity material once the impurities are washed away.

The procedures described in sections V.H, V.I and V.J are not requiredfor practicing the present invention, but may facilitate the developmentof novel binding proteins thereby.

V.B. Affinity Chromatography, Generally

Affinity column chromatography, batch elution from an affinity matrixmaterial held in some container, and batch elution from a plate are verysimilar and hereinafter will be treated under "affinity chromatography."

If affinity chromatography is to be used, then:

1) the molecules of the target material must be of sufficient size andchemical reactivity to be applied to a solid support suitable foraffinity separation,

2) after application to a matrix, the target material preferably doesnot react with water,

3) after application to a matrix, the target material preferably doesnot bind or degrade proteins in a non-specific way, and

4) the molecules of the target material must be sufficiently large thatattaching the material to a matrix allows enough unaltered surface area(generally at least 500 Å², excluding the atom that is connected to thelinker) for protein binding.

Affinity chromatography is the preferred separation means, but FACS,electrophoresis, or other means may also be used.

V.C. Fluorescent-Activated Cell Sorting, Generally

Fluorescent-activated cell sorting involves use of an affinity materialthat is fluorescent per se or is labeled with a fluorescent molecule.Current commercially available cell sorters require 800 to 1000molecules of fluorescent dye, such as Texas red, bound to each cell.FACS can sort 10³ cells or viruses/sec.

FACS (e.g. FACStar from Beckton-Dickinson, Mountain View, Calif.) ismost appropriate for bacterial cells and spores because the sensitivityof the machines requires approximately 1000 molecules of fluorescentlabel bound to each GP to accomplish a separation. OSPs such as OmpA,OmpF, OmpC are present at ≧10⁴ /cell, often as much as 10⁵ /cell. Thususe of FACS with PBDs displayed on one of the OSPs of a bacterial cellis attractive. This is particularly true if the target is quite small sothat attachment to a matrix has a much greater effect than wouldattachment to a dye. To optimize FACS separation of GPs, we use aderivative of Afm(IPBD) that is labeled with a fluorescent molecule,denoted Afm(IPBD)*. The variables to be optimized include: a) amount ofIPBD/GP, b) concentration of Afm(IPBD)*, c) ionic strength, d)concentration of GPs, and e) parameters pertaining to operation of theFACS machine. Because Afm(IPBD)* and GPs interact in solution, thebinding will be linear in both [Afm(IPBD)*] and [displayed IPBD].Preferably, these two parameters are varied together. The otherparameters can be optimized independently.

If FACS is to be used as the affinity separation means, then:

1) the molecules of the target material must be of sufficient size andchemical reactivity to be conjugated to a suitable fluorescent dye orthe target must itself be fluorescent,

2) after any necessary fluorescent labeling, the target preferably doesnot react with water,

3) after any necessary fluorescent labeling, the target materialpreferably does not bind or degrade proteins in a non-specific way, and

4) the molecules of the target material must be sufficiently large thatattaching the material to a suitable dye allows enough unaltered surfacearea (generally at least 500 Å², excluding the atom that is connected tothe linker) for protein binding.

V.D. Affinity Electrophoresis. Generally

Electrophoretic affinity separation involves electrophoresis of virusesor cells in the presence of target material, wherein the binding of saidtarget material changes the net charge of the virus particles or cells.It has been used to separate bacteriophages on the basis of charge.(SERW87).

Electrophoresis is most appropriate to bacteriophage because of theirsmall size (SERW87). Electrophoresis is a preferred separation means ifthe target is so small that chemically attaching it to a column or to afluorescent label would essentially change the entire target. Forexample, chloroacetate ions contain only seven atoms and would beessentially altered by any linkage. GPs that bind chloroacetate wouldbecome more negatively charged than GPs that do not bind the ion and sothese classes of GPs could be separated.

If affinity electrophoresis is to be used, then:

1) the target must either be charged or of such a nature that itsbinding to a protein will change the charge of the protein,

2) the target material preferably does not react with water,

3) the target material preferably does not bind or degrade proteins in anon-specific way, and

4) the target must be compatible with a suitable gel material.

The present invention makes use of affinity separation of bacterialcells, or bacterial viruses (or other genetic packages) to enrich apopulation for those cells or viruses carrying genes that code forproteins with desirable binding properties.

V.E. Target Materials

The present invention may be used to select for binding domains whichbind to one or more target materials, and/or fail to bind to one or moretarget materials. Specificity, of course, is the ability of a bindingmolecule to bind strongly to a limited set of target materials, whilebinding more weakly or not at all to another set of target materialsfrom which the first set must be distinguished.

The target materials may be organic macromolecules, such aspolypeptides, lipids, polynucleic acids, and polysaccharides, but arenot so limited. Almost any molecule that is stable in aqueous solventmay be used as a target. The following list of possible targets is givenas illustration and not as limitation. The categories are not strictlymutually exclusive. The omission of any category is not to be construedto imply that said category is unsuitable as a target. Merck Indexrefers to the Eleventh Edition.

A. Peptides

1) human β endorphin (Merck Index 3528)

2) dynorphin (MI 3458)

3) Substance P (MI 8834)

4) Porcine somatostatin (MI 8671)

5) human atrial natriuretic factor (MI 887)

6) human calcitonin

7) glucagon

B. Proteins

I. Soluble Proteins

a. Hormones

1) human TNF (MI 9411)

2) Interleukin-1 (MI 4895)

3) Interferon-γ (MI 4894)

4) Thyrotropin (MI 9709)

5) Interferon-α (MI 4892)

6) Insulin (MI 4887, p.789)

b. Enzymes

1) human neutrophil elastase

2) Human thrombin

3) human Cathepsin G

4) human tryptase

5) human chymase

6) human blood clotting Factor Xa

7) any retro-viral Pol protease

8) any retro-viral Gag protease

9) dihydrofolate reductase

10) Pseudomonas putida cytochrome P450_(CAM)

11) human pyruvate kinase

12) E. coli pyruvate kinase

13) jack bean urease

14) aspartate transcarbamylase (E. coli)

15) ras protein

16) any protein-tyrosine kinase

c. Inhibitors

1) aprotinin (MI 784)

2) human α1-anti-trypsin

3) phage λ cI (inhibits DNA transcription)

d. Receptors

1) TNF receptor

2) IgE receptor

3) LamB

4) CD4

5) IL-1 receptor

e. Toxins

1) ricin (also an enzyme)

2) α Conotoxin GI

3) mellitin

4) Bordetella pertussis adenylate cyclase (also an enzyme)

5) Pseudomonas aeruginosa hemolysin

f. Other proteins

1) horse heart myoglobin

2) human sickle-cell haemoglobin

3) human deoxy haemoglobin

4) human CO haemoglobin

5) human low-density lipoprotein (a lipoprotein)

6) human IgG (combining site removed or blocked) (a glycoprotein)

7) influenza haemagglutinin

8) phage λ capsid

9) fibrinogen

10) HIV-1 gp120

11) Neisseria gonorrhoeae pilin

12) fibril or flagellar protein from spirochaete bacterial species suchas those that cause syphilis, Lyme disease, or relapsing fever

13) pro-enzymes such as prothrombin and trypsinogen

II. Insoluble Proteins

1) silk

2) human elastin

3) keratin

4) collagen

5) fibrin

C. Nucleic acids

a. DNA ##STR7## b. RNA 1) yeast Phe tRNA

2) ribosomal RNA

3) segment of mRNA

D. Organic molecules (not peptide, protein, or nucleic acid)

I. Small and monomeric

1) cholesterol

2) aspartame

3) bilirubin

4) morphine

5) codeine

6) heroine

7) dichlorodiphenyltrichlorethane (DDT)

8) prostaglandin PGE2

9) actinomycin

10) 2,2,3 trimethyldecane

11) Buckminsterfullerene

12) cortavazol (MI 2536, p.397)

II. Polymers

1) cellulose

2) chitin

III. Others

1) O-antigen of Salmonella enteritidis (a lipopolysaccharide)

E. Inorganic compounds

1) asbestos

2) zeolites

3) hydroxylapatite

4) 111 face of crystalline silicon

5) paulingite

6) U(IV) (uranium ions)

7) Au(III) (gold ions)

F. Organometallic compounds

1) iron(III) haem

2) cobalt haem

3) cobalamine

4) (isopropylamino)₆ Cr(III)

Serine proteases are an especially interesting class of potential targetmaterials. Serine proteases are ubiquitous in living organisms and playvital roles in processes such as: digestion, blood clotting,fibrinolysis, immune response, fertilization, and post-translationalprocessing of peptide hormones. Although the role these enzymes play isvital, uncontrolled or inappropriate proteolytic activity can be verydamaging. Several serine proteases are directly involved in seriousdisease states. Uncontrolled neutrophil elastase (NE) (also known asleukocyte elastase) is thought to be the major cause of emphysema(BEIT86, HUBB86, HUBB89, HUTC87, SOMM90, WEWE87) whether caused bycongenital lack of α-1-antitrypsin or by smoking. NE is also implicatedas an essential ingredient in the pernicious cycle of: ##STR8## observedin cystic fibrosis (CF) (NADE90). Inappropriate NE activity is veryharmful and to stop the progression of emphysema or to alleviate thesymptoms of CF, an inhibitor of very high affinity is needed. Theinhibitor must be very specific to NE lest it inhibit other vital serineproteases or esterases. Nadel (NADE90) has suggested that onset ofexcess secretion is initiated by 10⁻¹⁰ M NE; thus, the inhibitor mustreduce the concentration of free NE to well below this level. Thus humanneutrophil elastase is a preferred target and a highly stable protein isa preferred IPBD. In particular, BPTI, ITI-DI, or another BPTI homologueis a preferred IPBD for development of an inhibitor to HNE. Otherpreferred IPBDs for making an inhibitor to HNE include CMTI-III, SLPI,Eglin, α-conotoxin GI, and Ω Conotoxins.

HNE is not the only serine protease for which an inhibitor would bevaluable. Works concerning uses of protease inhibitors and diseasesthought to result from inappropriate protease activity include: NADE87,REST88, SOMM90, and SOMM89. Tryptase and chymase may be involved inasthma, see FRAN89 and VAND89. There are reports that suggest thatProteinase 3 (also known as p29) is as important or even more importantthan HNE; see NILE89, ARNA90, KAOR88, CAMP90, and GUPT90. Cathepsin G isanother protease that may cause disease when present in excess; seeFERR90, PETE89, SALV87, and SOMM90. These works indicate that a problemexists and that blocking one or another protease might well alleviate adisease state. Some of the cited works report inhibitors havingmeasurable affinity for a target protease, but none report trulyexcellent inhibitors that have K_(d) in the range of 10⁻¹² M as may beobtained by the method of the present invention. The same IPBDs used forHNE can be used for any serine protease.

The present invention is not, however, limited to any of theabove-identified target materials. The only limitation is that thetarget material be suitable for affinity separation.

A supply of several milligrams of pure target material is desired. WithHNE (as discussed in Examples II and III), 400 μg of enzyme is used toprepare 200 μl of ReactiGel beads. This amount of beads is sufficientfor as many as 40 fractionations. Impure target material could be used,but one might obtain a protein that binds to a contaminant instead of tothe target.

The following information about the target material is highlydesirable: 1) stability as a function of temperature, pH, and ionicstrength, 2) stability with respect to chaotropes such as urea orguanidinium Cl, 3) pI, 4) molecular weight, 5) requirements forprosthetic groups or ions, such as haem or Ca⁺², and 6) proteolyticactivity, if any. It is also potentially useful to know: 1) the target'ssequence, if the target is a macromolecule, 2) the 3D structure of thetarget, 3) enzymatic activity, if any, and 4) toxicity, if any.

The user of the present invention specifies certain parameters of theintended use of the binding protein: 1) the acceptable temperaturerange, 2) the acceptable pH range, 3) the acceptable concentrations ofions and neutral solutes, and 4) the maximum acceptable dissociationconstant for the target and the SBD:

    K.sub.T =[Target][SBD]/[Target:SBD].

In some cases, the user may require discrimination between T, thetarget, and N, some non-target. Let

    K.sub.T =[T][SBD]/[T:SBD], and

    K.sub.N =[N][SBD]/[N:SBD],

then K_(T) /K_(N) =([T][N:SBD])/([N][T:SBD]).

The user then specifies a maximum acceptable value for the ratio K_(T)/K_(N).

The target material preferably is stable under the specified conditionsof pH, temperature, and solution conditions.

If the target material is a protease, one considers the followingpoints:

1) a highly specific protease can be treated like any other target,

2) a general protease, such as subtilisin, may degrade the OSPs of theGP including OSP-PBDs; there are several alternative ways of dealingwith general proteases, including: a) use a protease inhibitor as PPBDso that the SBD is an inhibitor of the protease, b) a chemical inhibitormay be used to prevent proteolysis (e.g. phenylmethylfluorosulfate(PMFS) that inhibits serine proteases), c) one or more active-siteresidues may be mutated to create an inactive protein (e.g. a serineprotease in which the active serine is mutated to alanine), or d) one ormore active-site amino-acids of the protein may be chemically modifiedto destroy the catalytic activity (e.g. a serine protease in which theactive serine is converted to anhydroserine),

3) SBDs selected for binding to a protease need not be inhibitors; SBDsthat happen to inhibit the protease target are a fairly small subset ofSBDs that bind to the protease target,

4) the more we modify the target protease, the less like we are toobtain an SBD that inhibits the target protease, and

5) if the user requires that the SBD inhibit the target protease, thenthe active site of the target protease must not be modified any morethan necessary; inactivation by mutation or chemical modification arepreferred methods of inactivation and a protein protease inhibitorbecomes a prime candidate for IPBD. For example, BPTI has been mutated,by the methods of the present invention, to bind to proteases other thantrypsin.

Example III-VI disclose that uninhibited serine proteases may be used astargets quite successfully and that protein protease inhibitors derivedfrom BPTI and selected for binding to these immobilized proteases areexcellent inhibitors.

V.F. Immobilization or Labeling of Target Material

For chromatography, FACS, or electrophoresis there may be a need tocovalently link the target material to a second chemical entity. Forchromatography the second entity is a matrix, for FACS the second entityis a fluorescent dye, and for electrophoresis the second entity is astrongly charged molecule. In many cases, no coupling is requiredbecause the target material already has the desired property of: a)immobility, b) fluorescence, or c) charge. In other cases, chemical orphysical coupling is required.

Various means may be used to immobilize or label the target materials.The means of immobilization or labeling is, in part, determined by thenature of the target. In particular, the physical and chemical nature ofthe target and its functional groups of the target material determinewhich types of immobilization reagents may be most easily used.

For the purpose of selecting an immobilization method, it may be morehelpful to classify target materials as follows: (a) solid, whethercrystalline or amorphous, and insoluble in an aqueous solvent (e.g.,many minerals, and fibrous organics such as cellulose and silk); (b)solid, whether crystalline or amorphous, and soluble in an aqueoussolvent; (c) liquid, but insoluble in aqueous phase (e.g.,2,3,3-trimethyldecane); or (d) liquid, and soluble in aqueous media.

It is not necessary that the actual target material be used in preparingthe immobilized or labeled analogue that is to be used in affinityseparation; rather, suitable reactive analogues of the target materialmay be more convenient. If 2,3,3-trimethyldecane were the targetmaterial, for example, then 2,3,3-trimethyl-10-aminodecane would be fareasier to immobilize than the parental compound. Because the lattercompound is modified at one end of the chain, it retains almost all ofthe shape and charge attributes that differentiate the former compoundfrom other alkanes.

Target materials that do not have reactive functional groups may beimmobilized by first creating a reactive functional group through theuse of some powerful reagent, such as a halogen. For example, an alkanecan be immobilized for affinity by first halogenating it and thenreacting the halogenated derivative with an immobilized or immobilizableamine.

In some cases, the reactive groups of the actual target material mayoccupy a part on the target molecule that is to be left undisturbed. Inthat case, additional functional groups may be introduced by syntheticchemistry. For example, the most reactive groups in cholesterol are onthe steroid ring system, viz, --OH and >C═C. We may wish to leave thisring system as it is so that it binds to the novel binding protein. Inthis case, we prepare an analogue having a reactive group attached tothe aliphatic chain (such as 26-aminocholesterol) and immobilize thisderivative in a manner appropriate to the reactive group so attached.

Two very general methods of immobilization are widely used. The first isto biotinylate the compound of interest and then bind the biotinylatedderivative to immobilized avidin. The second method is to generateantibodies to the target material, immobilize the antibodies by any ofnumerous methods, and then bind the target material to the immobilizedantibodies. Use of antibodies is more appropriate for larger targetmaterials; small targets (those comprising, for example, ten or fewernon-hydrogen atoms) may be so completely engulfed by an antibody thatvery little of the target is exposed in the target-antibody complex.

Non-covalent immobilization of hydrophobic molecules without resort toantibodies may also be used. A compound, such as 2,3,3-trimethyldecaneis blended with a matrix precursor, such as sodium alginate, and themixture is extruded into a hardening solution. The resulting beads willhave 2,3,3-trimethyldecane dispersed throughout and exposed on thesurface.

Other immobilization methods depend on the presence of particularchemical functionalities. A polypeptide will present --NH₂ (N-terminal;Lysines), --COOH (C-terminal; Aspartic Acids; Glutamic Acids), --OH(Serines; Threonines; Tyrosines), and --SH (Cysteines). A polysaccharidehas free --OH groups, as does DNA, which has a sugar backbone.

The following table is a nonexhaustive review of reactive functionalgroups and potential immobilization reagents:

    ______________________________________                                        Group          Reagent                                                        ______________________________________                                        R-NH.sub.2     Derivatives of 2,4,6-trinitro                                                 benzene sulfonates (TNBS),                                                    (CREI84, p. 11)                                                R-NH.sub.2     Carboxylic acid anhydrides,                                                   e.g. derivatives of succinic                                                  anhydride, maleic anhydride,                                                  citraconic anhydride (CREI84,                                                 (p. 11)                                                        R-NH.sub.2     Aldehydes that form reducible                                                 Schiff bases (CREI84, p. 12)                                   guanido        cyclohexanedione derivatives                                                  (CREI84, p. 14)                                                R-CO.sub.2 H   Diazo cmpds (CREI84, p. 10)                                    R-CO.sub.2 --  Epoxides (CREI84, p. 10)                                       R-OH           Carboxylic acid anhydrides                                     Aryl-OH        Carboxylic acid anhydrides                                     Indole ring    Benzyl halide and sulfenyl                                                    halides (CREI84, p. 19)                                        R-SH           N-alkylmaleimides (CREI84,                                                    p. 21)                                                         R-SH           ethyleneimine derivatives                                                     (CREI84, p. 21)                                                R-SH           Aryl mercury compounds,                                                       (CREI84, P. 21)                                                R-SH           Disulfide reagents, (CREI84,                                                  p. 23)                                                         Thiol ethers   Alkyl iodides, (CREI84, p. 20)                                 Ketones        Make Schiff's base and reduce                                                 with NaBH.sub.4. (CREI84, p. 12-13)                            Aldehydes      Oxidize to COOH, vide supra.                                   R-SO.sub.3 H   Convert to R-SO.sub.2 Cl and react                                            with immobilized alcohol or                                                   amine.                                                         R-PO.sub.3 H   Convert to R-PO.sub.2 Cl and react                                            with immobilized alcohol or                                                   amine.                                                         CC double bonds                                                                              Add Hbr and then make amine                                                   or thiol.                                                      ______________________________________                                    

The next table identifies the reactive groups of a number of potentialtargets.

    ______________________________________                                                         Reactive groups or                                           Compound (Item #, page)*                                                                       [derivatives]                                                ______________________________________                                        prostaglandin E2 (2893, 1251)                                                                  --OH, keto, --COOH, C═C                                  aspartame (861, 132)                                                                           --NH.sub.2 --COOH, --COOCH.sub.3                             haem (4558, 732) vinyl, --COOH, Fe                                            bilirubin (1235, 189)                                                                          vinyl, --COOH, keto, --NH--                                  morphine (6186, 988)                                                                           --OH, --C═C--, reactive phenyl                                            ring                                                         codeine (2459, 384)                                                                            --OH, --C═C--, reactive phenyl                                            ring                                                         dichlorodiphenyltrichlorethane                                                                 aromatic chlorine, aliphatic                                 (2832, 446)      chlorine                                                     benzo(a)pyrene (1113, 172)                                                                     [Chlorinate→amine, or make                                             sulfonate→ Aryl-SO.sub.2 Cl]                          actinomycin D (2804, 441)                                                                      aryl-NH.sub.2, --OH                                          cellulose        self immobilized                                             hydroxylapatite  self immobilized                                             cholesterol (2204, 341)                                                                        --OH, >C═C--                                             ______________________________________                                         *Note:                                                                        Item # and page refer to the Merck Index, 11th Edition.   Edition.       

The extensive literature on affinity chromatography and relatedtechniques will provide further examples.

Matrices suitable for use as support materials include polystyrene,glass, agarose and other chromatographic supports, and may be fabricatedinto beads, sheets, columns, wells, and other forms as desired.Suppliers of support material for affinity chromatography include:Applied Protein Technologies Cambridge, Mass.; BioRad Laboratories,Rockville Center, N.Y.; Pierce Chemical Company, Rockford, Ill. Targetmaterials are attached to the matrix in accord with the directions ofthe manufacturer of each matrix preparation with consideration of goodpresentation of the target.

Early in the selection process, relatively high concentrations of targetmaterials may be applied to the matrix to facilitate binding; targetconcentrations may subsequently be reduced to select for higher affinitySBDs.

V.G. Elution of Lower Affinity PBD-Bearing Genetic Packages

The population of GPs is applied to an affinity matrix under conditionscompatible with the intended use of the binding protein and thepopulation is fractionated by passage of a gradient of some solute overthe column. The process enriches for PBDs having affinity for the targetand for which the affinity for the target is least affected by theeluants used. The enriched fractions are those containing viable GPsthat elute from the column at greater concentration of the eluant.

The eluants preferably are capable of weakening noncovalent interactionsbetween the displayed PBDs and the immobilized target material.Preferably, the eluants do not kill the genetic package; the geneticmessage corresponding to successful mini-proteins is most convenientlyamplified by reproducing the genetic package rather than by in vitroprocedures such as PCR. The list of potential eluants includes salts(including Na+, NH₄ +, Rb+, SO₄ --, H₂ PO₄ -, citrate, K+, Li+, Cs+,HSO₄ -, CO₃ --, Ca++, Sr++, Cl-, PO₄ ---, HCO₃ -, Mg++, Ba++, Br-, HPO₄-- and acetate), acid, heat, compounds known to bind the target, andsoluble target material (or analogues thereof).

Because bacteria continue to metabolize during affinity separation, thechoice of buffer components is more restricted for bacteria than forbacteriophage or spores. Neutral solutes, such as ethanol, acetone,ether, or urea, are frequently used in protein purification and areknown to weaken non-covalent interactions between proteins and othermolecules. Many of these species are, however, very harmful to bacteriaand bacteriophage. Urea is known not to harm M13 up to 8 M. Bacterialspores, on the other hand, are impervious to most neutral solutes.Several affinity separation passes may be made within a single round ofvariegation. Different solutes may be used in different analyses, saltin one, pH in the next, etc.

Any ions or cofactors needed for stability of PBDs (derived from IPBD)or target are included in initial and elution buffers at appropriatelevels. We first remove GP(PBD)s that do not bind the target by washingthe matrix with the initial buffer. We determine that this phase ofwashing is complete by plating aliquots of the washes or by measuringthe optical density (at 260 nm or 280 nm). The matrix is then elutedwith a gradient of increasing: a) salt, b) [H+] (decreasing pH), c)neutral solutes, d) temperature (increasing or decreasing), or e) somecombination of these factors. The solutes in each of the first threegradients have been found generally to weaken non-covalent interactionsbetween proteins and bound molecules. Salt is a preferred solute forgradient formation in most cases. Decreasing pH is also a highlypreferred eluant. In some cases, the preferred matrix is not stable tolow pH so that salt and urea are the most preferred reagents. Othersolutes that generally weaken non-covalent interaction between proteinsand the target material of interest may also be used.

The uneluted genetic packages contain DNA encoding binding domains whichhave a sufficiently high affinity for the target material to resist theelution conditions. The DNA encoding such successful binding domains maybe recovered in a variety of ways. Preferably, the bound geneticpackages are simply eluted by means of a change in the elutionconditions. Alternatively, one may culture the genetic package in situ.or extract the target-containing matrix with phenol (or other suitablesolvent) and amplify the DNA by PCR or by recombinant DNA techniques.Additionally, if a site for a specific protease has been engineered intothe display vector, the specific protease is used to cleave the bindingdomain from the GP.

V.H. Optimization of Affinity Chromatography Separation

For linear gradients, elution volume and eluant concentration aredirectly related. Changes in eluant concentration cause GPs to elutefrom the column. Elution volume, however, is more easily measured andspecified. It is to be understood that the eluant concentration is theagent causing GP release and that an eluant concentration can becalculated from an elution volume and the specified gradient.

Using a specified elution regime, we compare the elution volumes ofGP(IPBD)s with the elution volumes of wtGP on affinity columnssupporting AfM(IPBD). Comparisons are made at various: a) amounts ofIPBD/GP, b) densities of AfM(IPBD)/(volume of matrix) (DoAMoM), c)initial ionic strengths, d) elution rates, e) amounts of GP/(volume ofsupport), f) pHs, and g) temperatures, because these are the parametersmost likely to affect the sensitivity and efficiency of the separation.We then pick those conditions giving the best separation.

We do not optimize pH or temperature; rather we record optimal valuesfor the other parameters for one or more values of pH and temperature.The pH used must be within the range of pH for which GP(IPBD) binds theAfM(IPBD) that is being used in this step. The conditions of intendeduse specified by the user may include a specification of pH ortemperature. If pH is specified, then pH will not be varied in elutingthe column. Decreasing pH may, however, be used to liberate bound GPsfrom the matrix. Similarly, if the intended use specifies a temperature,we will hold the affinity column at the specified temperature duringelution, but we might vary the temperature during recovery. If theintended use specifies the pH or temperature, then we prefer that theaffinity separation be optimized for all other parameters at thespecified pH and temperature.

In the optimization devised in this step, we preferably use a moleculeknown to have moderate affinity for the IPBD (K_(d) in the range 10⁻⁶ Mto 10⁻⁸ M), for the following reason. When populations of GP(vgPBD)s arefractionated, there will be roughly three subpopulations: a) those withno binding, b) those that have some binding but can be washed off withhigh salt or low pH, and c) those that bind very tightly and are mosteasily rescued in situ. We optimize the parameters to separate (a) from(b) rather than (b) from (c). Let PBD_(w) be a PBD having weak bindingto the target and PBD_(s) be a PBD having strong binding. Higher DoAMoMmight, for example, favor retention of GP(PBD_(w)) but also make it verydifficult to elute viable GP(PBD_(s)). We will optimize the affinityseparation to retain GP(PBD_(w)) rather than to allow release ofGP(PBD_(s)) because a tightly bound GP(PBD_(s)) can be rescued by insitu growth. If we find that DoAMoM strongly affects the elution volume,then in part III we may reduce the amount of target on the affinitycolumn when an SBD has been found with moderately strong affinity (K_(d)on the order of 10⁻⁷ M) for the target.

In case the promoter of the osp-ipbd gene is not regulated by a chemicalinducer, we optimize DoAMoM, the elution rate, and the amount ofGP/volume of matrix. If the optimized affinity separation is acceptable,we proceed. If not, we develop a means to alter the amount of IPBD perGP. Among GPs considered in the present invention, this case could ariseonly for spores because regulatable promoters are available for allother systems.

If the amount of IPBD/spore is too high, we could engineer an operatorsite into the osp-ipbd gene. We choose the operator sequence such that arepressor sensitive to a small diffusible inducer recognizes theoperator. Alternatively, we could alter the Shine-Dalgarno sequence toproduce a lower homology with consensus Shine-Dalgarno sequences. If theamount of IPBD/spore is too low, we can introduce variability into thepromoter or Shine-Dalgarno sequences and screen colonies for higheramounts of IPBD/spore.

In this step, we measure elution volumes of genetically pure GPs thatelute from the affinity matrix as sharp bands that can be detected by UVabsorption. Alternatively, samples from effluent fractions can be platedon suitable medium (cells or spores) or on sensitive cells (phage) andcolonies or plaques counted.

Several values of IPBD/GP, DoAMoM, elution rates, initial ionicstrengths, and loadings should be examined. The following is only one ofmany ways in which the affinity separation could be optimized. Weanticipate that optimal values of IPBD/GP and DoAMoM will be correlatedand therefore should be optimized together. The effects of initial ionicstrength, elution rate, and amount of GP/(matrix volume) are unlikely tobe strongly correlated, and so they can be optimized independently.

For each set of parameters to be tested, the column is eluted in aspecified manner. For example, we may use a regime called Elution Regime1: a KCl gradient runs from 10mM to maximum allowed for the GP(IPBD)viability in 100 fractions of 0.05 V_(V), followed by 20 fractions of0.05 V_(V) at maximum allowed KCl; pH of the buffer is maintained at thespecified value with a convenient buffer such as phosphate, Tris, orMOPS. Other elution regimes can be used; what is important is that theconditions of this optimization be similar to the conditions that areused in Part III for selection for binding to target and recovery of GPsfrom the chromatographic system.

When the osp-ipbd gene is regulated by [XINDUCE], IPBD/GP can becontrolled by varying [XINDUCE]. Appropriate values of [XINDUCE] dependon the identity of [XINDUCE] and the promoter; if, for example, XINDUCEis isopropylthiogalactoside (IPTG) and the promoter is lacUV5, then[IPTG]=0, 0.1 uM, 1.0 uM, 10.0 uM, 100.0 uM, and 1.0 mM would beappropriate levels to test. The range of variation of [XINDUCE] isextended until an optimum is found or an acceptable level of expressionis obtained.

DoAMoM is varied from the maximum that the matrix material can bind to1% or 0.1% of this level in appropriate steps. We anticipate that theefficiency of separation will be a smooth function of DoAMoM so that itis appropriate to cover a wide range of values for DoAMoM with a coarsegrid and then explore the neighborhood of the approximate optimum with afiner grid.

Several values of initial ionic strength are tested, such as 1.0 mM, 5.0mM, 10.0 mM and 20.0 mM. Low ionic strength favors binding betweenoppositely charged groups, but could also cause GP to precipitate.

The elution rate is varied, by successive factors of 1/2, from themaximum attainable rate to 1/16 of this value. If the lowest elutionrate tested gives the best separation, we test lower elution rates untilwe find an optimum or adequate separation.

The goal of the optimization is to obtain a sharp transition betweenbound and unbound GPs, triggered by increasing salt or decreasing pH ora combination of both. This optimization need be performed only: a) foreach temperature to be used, b) for each pH to be used, and c) when anew GP(IPBD) is created.

V.I. Measuring the sensitivity of affinity separation

Once the values of IPBD/GP, DoAMoM, initial ionic strength, elutionrate, and amount of GP/(volume of affinity support) have been optimized,we determine the sensitivity of the affinity separation (C_(sensi)) bythe following procedure that measures the minimum quantity of GP(IPBD)that can be detected in the presence of a large excess of wtGP. The userchooses a number of separation cycles, denoted N_(chrom), that will beperformed before an enrichment is abandoned; preferably, N_(chrom) is inthe range 6 to 10 and N_(chrom) must be greater than 4. Enrichment canbe terminated by isolation of a desired GP(SBD) before N_(chrom) passes.

The measurement of sensitivity is significantly expedited if GP(IPBD)and wtGP carry different selectable markers because such markers alloweasy identification of colonies obtained by plating fractions obtainedfrom the chromatography column. For example, if wtGP carries kanamycinresistance and GP(IPBD) carries ampicillin resistance, we can platefractions from a column on non-selective media suitable for the GP.Transfer of colonies onto ampicillin- or kanamycin-containing media willdetermine the identity of each colony.

Mixtures of GP(IPBD) and wtGP are prepared in the ratios of 1:V_(lim),where V_(lim) ranges by an appropriate factor (e.g. 1/10) over anappropriate range, typically 10¹¹ through 10⁴. Large values of V_(lim)are tested first; once a positive result is obtained for one value ofV_(lim), no smaller values of V_(lim) need be tested. Each mixture isapplied to a column supporting, at the optimal DoAMoM, an AfM(IPBD)having high affinity for IPBD and the column is eluted by the specifiedelution regime, such as Elution Regime 1. The last fraction thatcontains viable GPs and an inoculum of the column matrix material arecultured. If GP(IPBD) and wtGP have different selectable markers, thentransfer onto selection plates identifies each colony. If GP(IPBD) andwtGP have no selectable markers or the same selectable markers, then anumber (e.g. 32) of GP clonal isolates are tested for presence of IPBD.If IPBD is not detected on the surface of any of the isolated GPs, thenGPs are pooled from: a) the last few (e.g. 3 to 5) fractions thatcontain viable GPs, and b) an inoculum taken from the column matrix. Thepooled GPs are cultured and passed over the same column and enriched forGP(IPBD) in the manner described. This process is repeated untilN_(chrom) passes have been performed, or until the IPBD has beendetected on the GPs. If GP(IPBD) is not detected after N_(chrom) passes,V_(lim) is decreased and the process is repeated.

Once a value for V_(lim) is found that allows recovery of GP(IPBD)s, thefactor by which V_(lim) is varied is reduced and additional values aretested until V_(lim) is known to within a factor of two.

C_(sensi) equals the highest value of V_(lim) for which the user canrecover GP(IPBD) within N_(chrom) passes. The number of chromatographiccycles (K_(cyc)) that were needed to isolate GP(IPBD) gives a roughestimate of C_(eff) ; C_(eff) is approximately the K_(cyc) th root ofV_(lim) : C_(eff) ≈exp{log_(e) (V_(lim))/K_(cyc) }

For example, if V_(lim) were 4.0×10⁸ and three separation cycles wereneeded to isolate GP(IPBD), then C_(eff) 736.

V.J. Measuring the efficiency of separation

To determine C_(eff) more accurately, we determine the ratio ofGP(IPBD)/wtGP loaded onto an AfM(IPBD) column that yields approximatelyequal amounts of GP(IPBD) and wtGP after elution. We prepare mixtures ofGP(IPBD) and wtGP in ratios GP(IPBD):wtGP::1:Q; we start Q at twentytimes the approximate C_(eff) found above. A 1:Q mixture of GP(IPBD) andwtGP is applied to a AfM(IPBD) column and eluted by the specifiedelution regime, such as Elution Regime 1. A sample of the last fractionthat contains viable GPs is plated at a dilution that gives wellseparated colonies or plaques. The presence of IPBD or the osp-ipbd genein each colony or plaque can be determined by a number of standardmethods, including: a) use of different selectable markers, b)nitrocellulose filter lift of GPs and detection with AfM(IPBD)*(AUSU87), or c) nitrocellulose filter lift of GPs and detection withradiolabeled DNA that is complementary to the osp-ipbd gene (AUSU87).Let F be the fraction of GP(IPBD) colonies found in the last fractioncontaining viable GPs. When a Q is found such that 0.20<F<0.80, then

    C.sub.eff =Q*F.

If F<0.2, then we reduce Q by an appropriate factor (e.g. 1/10) andrepeat the procedure. If F>0.8, then we increase Q by an appropriatefactor (e.g. 2) and repeat the procedure.

V.K. Reducing selection due to non-specific binding

When affinity chromatography is used for separating bound and unboundGPs, we may reduce non-specific binding of GP(PBD)s to the matrix thatbears the target in the following ways:

1) we treat the column with blocking agents such as geneticallydefective GPs or a solution of protein before the population ofGP(vgPBD)s is chromatographed, and

(2) we pass the population of GP(vgPBD)s over a matrix containing notarget or a different target from the same class as the actual targetprior to affinity chromatography.

Step (1) above saturates any non-specific binding that the affinitymatrix might show toward wild-type GPs or proteins in general; step (2)removes components of our population that exhibit non-specific bindingto the matrix or to molecules of the same class as the target. If thetarget were horse heart myoglobin, for example, a column supportingbovine serum albumin could be used to trap GPs exhibiting PBDs withstrong non-specific binding to proteins. If cholesterol were the target,then a hydrophobic compound, such as p-tertiarybutylbenzyl alcohol,could be used to remove GPs displaying PBDs having strong non-specificbinding to hydrophobic compounds. It is anticipated that PBDs that failto fold or that are prematurely terminated will be non-specificallysticky. These sequences could outnumber the PBDs having desirablebinding properties. Thus, the capacity of the initial column thatremoves indiscriminately adhesive PBDs should be greater (e.g. 5 foldgreater) than the column that supports the target molecule.

Variation in the support material (polystyrene, glass, agarose,cellulose, etc.) in analysis of clones carrying SBDs is used toeliminate enrichment for packages that bind to the support materialrather than the target.

FACs may be used to separate GPs that bind fluorescent labeled target.We discriminate against artifactual binding to the fluorescent label byusing two or more different dyes, chosen to be structurally different.GPs isolated using target labeled with a first dye are cultured. TheseGPs are then tested with target labeled with a second dye.

Electrophoretic affinity separation uses unaltered target so that onlyother ions in the buffer can give rise to artifactual binding.Artifactual binding to the gel material gives rise to retardationindependent of field direction and so is easily eliminated.

A variegated population of GPs will have a variety of charges. Thefollowing 2D electrophoretic procedure accommodates this variation inthe population. First the variegated population of GPs iselectrophoresed in a gel that contains no target material. Theelectrophoresis continues until the GP s are distributed along thelength of the lane. The gels described by Sewer for phage are very lowin agarose and lack mechanical stability. The target-free lane in whichthe initial electrophoresis is conducted is separate from a square ofgel that contains target material by a removable baffle. After the firstpass, the baffle is removed and a second electrophoresis is conducted atright angles to the first. GPs that do not bind target migrate withunaltered mobility while GP s that do bind target will separate from themajority that do not bind target. A diagonal line of non-binding GPswill form. This line is excised and discarded. Other parts of the gelare dissolved and the GPs cultured.

V.L. Isolation of GP(PBD)s with binding-to-target phenotypes

The harvested packages are now enriched for the binding-to-targetphenotype by use of affinity separation involving the target materialimmobilized on an affinity matrix. Packages that fail to bind to thetarget material are washed away. If the packages are bacteriophage orendospores, it may be desirable to include a bacteriocidal agent, suchas azide, in the buffer to prevent bacterial growth. The buffers used inchromatography include: a) any ions or other solutes needed to stabilizethe target, and b) any ions or other solutes needed to stabilize thePBDs derived from the IPBD.

V.M Recovery of Packages

Recovery of packages that display binding to an affinity column may beachieved in several ways, including:

1) collect fractions eluted from the column with a gradient as describedabove; fractions eluting later in the gradient contain GPs more enrichedfor genes encoding PBDs with high affinity for the column,

2) elute the column with the target material in soluble form,

3) flood the matrix with a nutritive medium and grow the desiredpackages in situ.

4) remove parts of the matrix and use them to inoculate growth medium,

5) chemically or enzymatically degrade the linkage holding the target tothe matrix so that GPs still bound to target are eluted, or

6) degrade the packages and recover DNA with phenol or other suitablesolvent; the recovered DNA is used to transform cells that regenerateGPs.

It is possible to utilize combinations of these methods. It should beremembered that what we want to recover from the affinity matrix is notthe GPs per se, but the information in them. Recovery of viable GPs isvery strongly preferred, but recovery of genetic material is essential.If cells, spores, or virions bind irreversibly to the matrix but are notkilled, we can recover the information through in situ cell division,germination, or infection respectively. Proteolytic degradation of thepackages and recovery of DNA is not preferred.

Although degradation of the bound GPs and recovery of genetic materialis a possible mode of operation, inadvertent inactivation of the GPs isvery deleterious. It is preferred that maximum limits for solutes thatdo not inactivate the GPs or denature the target or the column aredetermined. If the affinity matrices are expendable, one may useconditions that denature the column to elute GPs; before the target isdenatured, a portion of the affinity matrix should be removed forpossible use as an inoculum. As the GPs are held together byprotein-protein interactions and other non-covalent molecularinteractions, there will be cases in which the molecular package willbind so tightly to the target molecules on the affinity matrix that theGPs can not be washed off in viable form. This will only occur when verytight binding has been obtained. In these cases, methods (3) through (5)above can be used to obtain the bound packages or the genetic messagesfrom the affinity matrix.

It is possible, by manipulation of the elution conditions, to isolateSBDs that bind to the target at one pH (pH_(b)) but not at another pH(pH_(o)). The population is applied at pH_(b) and the column is washedthoroughly at pH_(b). The column is then eluted with buffer at pH_(o)and GPs that come off at the new pH are collected and cultured. Similarprocedures may be used for other solution parameters, such astemperature. For example, GP(vgPBD)s could be applied to a columnsupporting insulin. After eluting with salt to remove GPs with little orno binding to insulin, we elute with salt and glucose to liberate GPsthat display PBDs that bind insulin or glucose in a competitive manner.

V.N. Amplifying the Enriched Packages

Viable GPs having the selected binding trait are amplified by culture ina suitable medium, or, in the case of phage, infection into a host socultivated. If the GPs have been inactivated by the chromatography, theOCV carrying the osp-pbd gene are recovered from the GP, and introducedinto a new, viable host.

V.O. Determining whether further enrichment is needed

The probability of isolating a GP with improved binding increases byC_(eff) with each separation cycle. Let N be the number of distinctamino-acid sequences produced by the variegation. We want to perform Kseparation cycles before attempting to isolate an SBD, where K is suchthat the probability of isolating a single SBD is 0.10 or higher.

    K=the smallest integer>=log.sub.10 (0.10 N)/log.sub.10 (C.sub.eff)

For example, if N were 1.0·10⁷ and C_(eff) =6.31·10², then log₁₀(1.0·10⁶)/log₁₀ (6.31·10²)=6.0000/2.8000=2.14. Therefore we wouldattempt to isolate SBDs after the third separation cycle. After only twoseparation cycles, the probability of finding an SBD is

    (6.31×10.sup.2).sup.2 /(1.0×10.sup.7)=0.04

and attempting to isolate SBDs might be profitable.

Clonal isolates from the last fraction eluted which contained any viableGPs, as well as clonal isolates obtained by culturing an inoculum takenfrom the affinity matrix, are cultured in a growth step that is similarto that described previously. Other fractions may be cultured too. If Kseparation cycles have been completed, samples from a number, e.g. 32,of these clonal isolates are tested for elution properties on the{target} column. If none of the isolated, genetically pure GPs showimproved binding to target, or if K cycles have not yet been completed,then we pool and culture, in a manner similar to the manner set forthpreviously, the GPs from the last few fractions eluted that containedviable GPs and from the GPs obtained by culturing an inoculum taken fromthe column matrix. We then repeat the enrichment procedure describedabove. This cyclic enrichment may continue N_(chrom) passes or until anSBD is isolated.

If one or more of the isolated GPs has improved retention on the {target} column, we determine whether the retention of the candidate SBDs isdue to affinity for the target material as follows. A second column isprepared using a different support matrix with the target material boundat the optimal density. The elution volumes, under the same elutionconditions as used previously, of candidate GP(SBD)s are compared toeach other and to GP(PPBD of this round). If one or more candidateGP(SBD)s has a larger elution volume than GP(PPBD of this round), thenwe pick the GP(SBD) having the highest elution volume and proceed tocharacterize the population. If none of the candidate GP(SBD)s hashigher elution volume than GP(PPBD of this round), then we pool andculture, in a manner similar to the manner used previously, the GPs fromthe last few fractions that contained viable GPs and the GPs obtained byculturing an inoculum taken from the column matrix. We then repeat theenrichment procedure.

If all of the SBDs show binding that is superior to PPBD of this round,we pool and culture the GPs from the last fraction that contains viableGPs and from the inoculum taken from the column. This population isrechromatographed at least one pass to fractionate further the GPs basedon K_(d).

If an RNA phage were used as GP, the RNA would either be cultured withthe assistance of a helper phage or be reverse transcribed and the DNAamplified. The amplified DNA could then be sequenced or subcloned intosuitable plasmids.

V.P. Characterizing the Putative SBDs

We characterize members of the population showing desired bindingproperties by genetic and biochemical methods. We obtain clonal isolatesand test these strains by genetic and affinity methods to determinegenotype and phenotype with respect to binding to target. For severalgenetically pure isolates that show binding, we demonstrate that thebinding is caused by the artificial chimeric gene by excising theosp-sbd gene and crossing it into the parental GP. We also ligate thedeleted backbone of each GP from which the osp-sbd is removed anddemonstrate that each backbone alone cannot confer binding to the targeton the GP. We sequence the osp-sbd gene from several clonal isolates.Primers for sequencing are chosen from the DNA flanking the osp-ppbdgene or from parts of the osp-ppbd gene that are not variegated.

The present invention is not limited to a single method of determiningprotein sequences, and reference in the appended claims to determiningthe amino acid sequence of a domain is intended to include any practicalmethod or combination of methods, whether direct or indirect. Thepreferred method, in most cases, is to determine the sequence of the DNAthat encodes the protein and then to infer the amino acid sequence. Insome cases, standard methods of protein-sequence determination may beneeded to detect post-translational processing.

The present invention is not limited to a single method of determiningthe sequence of nucleotides (nts) in DNA subsequences. In the preferredembodiment, plasmids are isolated and denatured in the presence of asequencing primer, about 20 nts long, that anneals to a region adjacent,on the 5∝ side, to the region of interest. This plasmid is then used asthe template in the four sequencing reactions with one dideoxy substratein each. Sequencing reactions, agarose gel electrophoresis, andpolyacrylamide gel electrophoresis (PAGE) are performed by standardprocedures (AUSU87).

For one or more clonal isolates, we may subclone the sbd gene fragment,without the osp fragment, into an expression vector such that each SBDcan be produced as a free protein. Because numerous unique restrictionsites were built into the inserted domain, it is easy to subclone thegene at any time. Each SBD protein is purified by normal means,including affinity chromatography. Physical measurements of the strengthof binding are then made on each free SBD protein by one of thefollowing methods: 1) alteration of the Stokes radius as a function ofbinding of the target material, measured by characteristics of elutionfrom a molecular sizing column such as agarose, 2) retention ofradiolabeled binding protein on a spun affinity column to which has beenaffixed the target material, or 3) retention of radiolabeled targetmaterial on a spun affinity column to which has been affixed the bindingprotein. The measurements of binding for each free SBD are compared tothe corresponding measurements of binding for the PPBD.

In each assay, we measure the extent of binding as a function ofconcentration of each protein, and other relevant physical and chemicalparameters such as salt concentration, temperature, pH, and prostheticgroup concentrations (if any).

In addition, the SBD with highest affinity for the target from eachround is compared to the best SBD of the previous round (IPBD for thefirst round) and to the IPBD (second and later rounds) with respect toaffinity for the target material. Successive rounds of mutagenesis andselection-through-binding yield increasing affinity until desired levelsare achieved.

If we find that the binding is not yet sufficient, we decide whichresidues to vary next. If the binding is sufficient, then we now have aexpression vector bearing a gene encoding the desired novel bindingprotein.

V.Q. Joint selections

One may modify the affinity separation of the method described to selecta molecule that binds to material A but not to material B. One needs toprepare two selection columns, one with material A and the other withmaterial B. The population of genetic packages is prepared in the mannerdescribed, but before applying the population to A, one passes thepopulation over the B column so as to remove those members of thepopulation that have high affinity for B ("reverse affinitychromatography"). In the preceding specification, the initial columnsupported some other molecule simply to remove GP(PBD)s that displayedPBDs having indiscriminate affinity for surfaces.

It may be necessary to amplify the population that does not bind to Bbefore passing it over A. Amplification would most likely be needed if Aand B were in some ways similar and the PPBD has been selected forhaving affinity for A. The optimum order of interactions might bedetermined empirically. For example, to obtain an SBD that binds A butnot B, three columns could be connected in series: a) a columnsupporting some compound, neither A nor B, or only the matrix material,b) a column supporting B, and c) a column supporting A. A population ofGP(vgPBD)s is applied to the series of columns and the columns arewashed with the buffer of constant ionic strength that is used in theapplication. The columns are uncoupled, and the third column is elutedwith a gradient to isolate GP(PBD)s that bind A but not B.

One can also generate molecules that bind to both A and B. In this casewe can use a 3D model and mutate one face of the molecule in question toget binding to A. One can then mutate a different face to producebinding to B. When an SBD binds at least somewhat to both A and B, onecan mutate the chain by Diffuse Mutagenesis to refine the binding anduse a sequential joint selection for binding to both A and B.

The materials A and B could be proteins that differ at only one or a fewresidues. For example, A could be a natural protein for which the genehas been cloned and B could be a mutant of A that retains the overall 3Dstructure of A. SBDs selected to bind A but not B probably bind to Anear the residues that are mutated in B. If the mutations were picked tobe in the active site of A (assuming A has an active site), then an SBDthat binds A but not B will bind to the active site of A and is likelyto be an inhibitor of A.

To obtain a protein that will bind to both A and B, we can,alternatively, first obtain an SBD that binds A and a different SBD thatbinds B. We can then combine the genes encoding these domains so that atwo-domain single-polypeptide protein is produced. The fusion proteinwill have affinity for both A and B because one of its domains binds Aand the other binds B.

One can also generate binding proteins with affinity for both A and B,such that these materials will compete for the same site on the bindingprotein. We guarantee competition by overlapping the sites for A and B.Using the procedures of the present invention, we first create amolecule that binds to target material A. We then vary a set of residuesdefined as: a) those residues that were varied to obtain binding to A,plus b) those residues close in 3D space to the residues of set (a) butthat are internal and so are unlikely to bind directly to either A or B.Residues in set (b) are likely to make small changes in the positioningof the residues in set (a) such that the affinities for A and B will bechanged by small amounts. Members of these populations are selected foraffinity to both A and B.

V.R. Selection for non-binding

The method of the present invention can be used to select proteins thatdo not bind to selected targets. Consider a protein of pharmacologicalimportance, such as streptokinase, that is antigenic to an undesirableextent. We can take the pharmacologically important protein as IPBD andantibodies against it as target. Residues on the surface of thepharmacologically important protein would be variegated and GP(PBD)sthat do not bind to an antibody column would be collected and cultured.Surface residues may be identified in several ways, including: a) from a3D structure, b) from hydrophobicity considerations, or c) chemicallabeling. The 3D structure of the pharmacologically important proteinremains the preferred guide to picking residues to vary, except now wepick residues that are widely spaced so that we leave as little aspossible of the original surface unaltered.

Destroying binding frequently requires only that a single amino acid inthe binding interface be changed. If polyclonal antibodies are used, weface the problem that all or most of the strong epitopes must be alteredin a single molecule. Preferably, one would have a set of monoclonalantibodies, or a narrow range of antibody species. If we had a series ofmonoclonal antibody columns, we could obtain one or more mutations thatabolish binding to each monoclonal antibody. We could then combine someor all of these mutations in one molecule to produce a pharmacologicallyimportant protein recognized by none of the monoclonal antibodies. Suchmutants are tested to verify that the pharmacologically interestingproperties have not be altered to an unacceptable degree by themutations.

Typically, polyclonal antibodies display a range of binding constantsfor antigen. Even if we have only polyclonal antibodies that bind to thepharmacologically important protein, we may proceed as follows. Weengineer the pharmacologically important protein to appear on thesurface of a replicable GP. We introduce mutations into residues thatare on the surface of the pharmacologically important protein or intoresidues thought to be on the surface of the pharmacologically importantprotein so that a population of GPs is obtained. Polyclonal antibodiesare attached to a column and the population of GPs is applied to thecolumn at low salt. The column is eluted with a salt gradient. The GPsthat elute at the lowest concentration of salt are those which bearpharmacologically important proteins that have been mutated in a waythat eliminates binding to the antibodies having maximum affinity forthe pharmacologically important protein. The GPs eluting at the lowestsalt are isolated and cultured. The isolated SBD becomes the PPBD tofurther rounds of variegation so that the antigenic determinants aresuccessively eliminated.

V.S. Selection of PBDs for retention of structure

Let us take an SBD with known affinity for a target as PPBD to avariegation of a region of the PBD that is far from the residues thatwere varied to create the SBD. We can use the target as an affinitymolecule to select the PBDs that retain binding for the target, and thatpresumably retain the underlying structure of the IPBD. The variegationsin this case could include insertions and deletions that are likely todisrupt the IPBD structure. We could also use the IPBD and AfM(IPBD) inthe same way.

For example, if IPBD were BPTI and AfM(BPTI) were trypsin, we couldintroduce four or five additional residue after residue 26 and selectGPs that display PBDs having specific affinity for AfM(BPTI). Residue 26is chosen because it is in a turn and because it is about 25 A from K15,a key amino acid in binding to trypsin.

The underlying structure is most likely to be retained if insertions ordeletions are made at loops or turns.

V.T. Engineering of Antagonists

It may be desirable to provide an antagonist to an enzyme or receptor.This may be achieved by making a molecule that prevents the naturalsubstrate or agonist from reaching the active site. Molecules that binddirectly to the active site may be either agonists or antagonists. Thuswe adopt the following strategy. We consider enzymes and receptorstogether under the designation TER (Target Enzyme or Receptor).

For most TERs, there exist chemical inhibitors that block the activesite. Usually, these chemicals are useful only as research tools due tohighly toxicity. We make two affinity matrices: one with active TER andone with blocked TER. We make a variegated population of GP(PBD)s andselect for SBPs that bind to both forms of the enzyme, thereby obtainingSDPs that do not bind to the active site. We expect that SBDs will befound that bind different places on the enzyme surface. Pairs of the sbdgenes are fused with an intervening peptide segment. For example, ifSBD-1 and SBD-2 are binding domains that show high affinity for thetarget enzyme and for which the binding is non-competitive, then thegene sbd-1::linker::sbd-2 encodes a two-domain protein that will showhigh affinity for the target. We make several fusions having a varietyof SBDs and various linkers. Such compounds have a reasonableprobability of being an antagonist to the target enzyme.

VI. EXPLOITATION OF SUCCESSFUL BINDING DOMAINS AND CORRESPONDING DNASVI.A. Generally

Using the method of the present invention, we can obtain a replicablegenetic package that displays a novel protein domain having highaffinity and specificity for a target material of interest. Such apackage carries both amino-acid embodiments of the binding proteindomain and a DNA embodiment of the gene encoding the novel bindingdomain. The presence of the DNA facilitates expression of a proteincomprising the novel binding protein domain within a high-levelexpression system, which need not be the same system used during thedevelopmental process.

VI.B. Production of Novel Binding Proteins

We can proceed to production of the novel binding protein in severalways, including: a) altering of the gene encoding the binding domain sothat the binding domain is expressed as a soluble protein, not attachedto a genetic package (either by deleting codons 5' of those encoding thebinding domain or by inserting stop codons 3' of those encoding thebinding domain), b) moving the DNA encoding the binding domain into aknown expression system, and c) utilizing the genetic package as apurification system. (If the domain is small enough, it may be feasibleto prepare it by conventional peptide synthesis methods.)

Option (c) may be illustrated as follows. Assume that a novel BPTIderivative has been obtained by selection of M13 derivatives in which apopulation of BPTI-derived domains are displayed as fusions to maturecoat protein. Assume that a specific protease cleavage site (e.g. thatof activated clotting factor X) is engineered into the amino-acidsequence between the carboxy terminus of the BPTI-derived domain and themature coat domain. Furthermore, we alter the display system to maximizethe number of fusion proteins displayed on each phage. The desired phagecan be produced and purified, for example by centrifugation, so that nobacterial products remain. Treatment of the purified phage with acatalytic amount of factor X cleaves the binding domains from the phageparticles. A second centrifugation step separates the cleaved proteinfrom the phage, leaving a very pure protein preparation.

VI.C. Mini-Protein Production

As previously mentioned, an advantage inhering from the use of amini-protein as an IPBD is that it is likely that the derived SBD willalso behave like a mini-protein and will be obtainable by means ofchemical synthesis. (The term "chemical synthesis", as used herein,includes the use of enzymatic agents in a cell-free environment.)

It is also to be understood that mini-proteins obtained by the method ofthe present invention may be taken as lead compounds for a series ofhomologues that contain non-naturally occurring amino acids and groupsother than amino acids. For example, one could synthesize a series ofhomologues in which each member of the series has one amino acidreplaced by its D enantiomer. One could also make homologues containingconstituents such as β alanine, aminobutyric acid, 3-hydroxyproline,2-Aminoadipic acid, N-ethylasperagine, norvaline, etc.; these would betested for binding and other properties of interest, such as stabilityand toxicity.

Peptides may be chemically synthesized either in solution or onsupports. Various combinations of stepwise synthesis and fragmentcondensation may be employed.

During synthesis, the amino acid side chains are protected to preventbranching. Several different protective groups are useful for theprotection of the thiol groups of cysteines:

1) 4-methoxybenzyl (MBzl; Mob)(NISH82; ZAFA88), removable with HF;

2) acetamidomethyl (Acm)(NISH82; NISH86; BECK89c), removable withiodine; mercury ions (e.g., mercuric acetate); silver nitrate; and

3) S-para-methoxybenzyl (HOUG84).

Other thiol protective groups may be found in standard reference workssuch as Greene, PROTECTIVE GROUPS IN ORGANIC SYNTHESIS (1981).

Once the polypeptide chain has been synthesized, disulfide bonds must beformed. Possible oxidizing agents include air (HOUG84; NISH86),ferricyanide (NISH82; HOUG84), iodine (NISH82), and performic acid(HOUG84). Temperature, pH, solvent, and chaotropic chemicals may affectthe course of the oxidation. biologically active form: conotoxin G1(13AA, 4 Cys)(NISH-82); heat-stable enterotoxin ST (18AA, 6 Cys)(HOUG84); analogues of ST (BHAT86); Ω-conotoxin GVIA (27AA, 6Cys)(NISH86; RIVI87b); Ω-conotoxin MVIIA (27 AA, 6 Cys) (OLIV87b);α-conotoxin SI (13 AA, 4 Cys) (ZAFA88); μ-conotoxin IIIa (22AA, 6 Cys)(BECK89c, CRUZ89, HATA90). Sometimes, the polypeptide naturally folds sothat the correct disulfide bonds are formed. Other times, it must behelped along by use of a differently removable protective group for eachpair of cysteines.

VI.D. Uses of Novel Binding Proteins

The successful binding domains of the present invention may, alone or aspart of a larger protein, be used for any purpose for which bindingproteins are suited, including isolation or detection of targetmaterials. In furtherance of this purpose, the novel binding proteinsmay be coupled directly or indirectly, covalently or noncovalently, to alabel, carrier or support.

When used as a pharmaceutical, the novel binding proteins may becontained with suitable carriers or adjuvanants.

All references cited anywhere in this specification are incorporated byreference to the extent which they may be pertinent.

EXAMPLE I DISPLAY OF BPTI AS A FUSION TO M13 GENE VIII PROTEIN

Example I involves display of BPTI on M13 as a fusion to the mature geneVIII coat protein. Each of the DNA constructions was confirmed byrestriction digestion analysis and DNA sequencing.

1. Construction of theviii-signal-sequence::bpti::mature-viii-coat-protein Display Vector A.Operative cloning vectors (OCV)

The operative cloning vectors are M13 and phagemids derived from M13 orf1. The initial construction was in the f1-based phagemid pGEM-3Zf(-)™(Promega Corp., Madison, Wis.).

A gene comprising, in order,: i) a modified lacUV5 promoter, ii) aShine-Dalgarno sequence, iii) DNA encoding the M13 gene VIII signalsequence, iv) a sequence encoding mature BPTI, v) a sequence encodingthe mature-M13-gene-VIII coat protein, vi) multiple stop codons, andvii) a transcription terminator, was constructed. This gene isillustrated in Tables 101-105; each table shows the same DNA sequencewith different features annotated. There are a number of differencesbetween this gene and the one proposed in the hypothetical example inthe generic specification of the parent application. Because the actualconstruction was made in pGEM-3Zf(-), the ends of the synthetic DNA weremade compatible with SalI and BamHI. The lacO operator of lacUV5 waschanged to the symmetrical lacO with the intention of achieving tighterrepression in the absence of IPTG. Several silent codon changes weremade so that the longest segment that is identical to wild-type geneVIII is minimized so that genetic recombination with the co-existinggene VIII is unlikely.

i) OCV based upon pGEM-3Zf

pGEM-3Zf™ (Promega Corp., Madison, Wis.) is a plasmid-based vectorcontaining the amp gene, bacterial origin of replication, bacteriophagefl origin of replication, a lacZ operon containing a multiple cloningsite sequence, and the T7 and SP6 polymerase binding sequences.

Two restriction enzyme recognition sites were introduced, bysite-directed oligonucleotide mutagenesis, at the boundaries of the lacZoperon. This allowed for the removal of the lacZ operon and itsreplacement with the synthetic gene. A BamHI recognition site (GGATCC)was introduced at the 5' end of the lacZ operon by the mutation of basesC₃₃₁ and T₃₃₂ to G and A respectively (numbering of Promega). A SalIrecognition site (GTCGAC) was introduced at the 3' end of the operon bythe mutation of bases C₃₀₂₁ and T₃₀₂₃ to G and C respectively. Aconstruct combining these variants of pGEM-3Zf was designatedpGEM-MB3/4.

ii) OCV based upon M13mp18

M13mp18 (YANI85) is an M13 bacteriophage-based vector (available from,inter alia, New England Biolabs, Beverly, Mass.) consisting of the wholeof the phage genome into which has been inserted a lacZ operoncontaining a multiple cloning site sequence (MESS77). Two restrictionenzyme sites were introduced into M13mp18 using standard methods. ABamHI recognition site (GGATCC) was introduced at the 5' end of the lacZoperon by the mutation of bases C₆₀₀₃ and G₆₀₀₄ to A and T respectively(numbering of Messing). This mutation also destroyed a unique NarI site.A SalI recognition site (GTCGAC) was introduced at the 3' end of theoperon by the mutation of bases A₆₄₃₀ and C₆₄₃₂ to C and A respectively.A construct combining these variants of M13mp18 was designatedM13-MB1/2.

B) Synthetic Gene

A synthetic gene(VIII-signal-sequence::mature-bpti::mature-VIII-coat-protein) wasconstructed from 16 synthetic oligonucleotides (Table 105), customsynthesized by Genetic Designs Inc. of Houston, Tex., using methodsdetailed in KIMH89 and ASHM89. Table 101 shows the DNA sequence; Table102 contains an annotated version of this sequence. Table 103 shows theoverlaps of the synthetic oligonucleotides in relationship to therestriction sites and coding sequence. Table 104 shows the synthetic DNAin double-stranded form. Table 105 shows each of the 16 syntheticoligonucleotides from 5'-to-3'. The oligonucleotides werephosphorylated, with the exception of the 5' most molecules, usingstandard methods, annealed and ligated in stages such that a finalsynthetic duplex was generated. The overhanging ends of this duplex wasfilled in with T4 DNA polymerase and it was cloned into the HincII siteof pGEM-3Zf(-); the initial construct is called pGEM-MB1 (Table 101a).Double-stranded DNA of pGEM-MB1 was cut with PstI, filled in with T4 DNApolymerase and ligated to a SalI linker (New England BioLabs) so thatthe synthetic gene is bounded by BamHI and SalI sites (Table 101b andTable 102b). The synthetic gene was obtained on a BamHI-SalI cassetteand cloned into pGEM-MB3/4 and M13-MB1/2 utilizing the BamHI and SalIsites previously introduced, to generate the constructs designatedpGEM-MB16 and M13-MB15, respectively. The full length of the syntheticinsert was sequenced and found to be unambiguously correct exceptfor: 1) a missing G in the Shine-Dalgarno sequence; and 2) a few silenterrors in the third bases of some codons (shown as upper case in Table101). Table 102 shows the Ribosome-binding site A₁₀₄ GGAGG but theactual sequence is A₁₀₄ GAGG. Efforts to express protein from thisconstruction, in vivo and in vitro, were unavailing.

C) Alterations to the synthetic gene i) Ribosome binding site (RBS)

Starting with the construct pGEM-MB16, a fragment of DNA bounded by therestriction enzyme sites SacI and NheI (containing the original RBS) wasreplaced with a synthetic oligonucleotide duplex (with compatible SacIand NheI overhangs) containing the sequence for a new RBS that is verysimilar to the RBS of E. coli phoA and that has been shown to befunctional. ##STR9##

The putative RBSs above are lower case and the initiating methioninecodon is underscored and bold. The resulting construct was designatedpGEM-MB20. In vitro expression of the gene carried by pGEM-MB20 produceda novel protein species of the expected size, about 14.5 kd.

ii) tac promoter

In order to obtain higher expression levels of the fusion protein, thelacUV5 promoter was changed to a tac promoter. Starting with theconstruct pGEM-MB16, which contains the lacUV5 promoter, a fragment ofDNA bounded by the restriction enzyme sites BamHI and HpaII was excisedand replaced with a compatible synthetic oligonucleotide duplexcontaining the -35 sequence of the trp promoter, Cf RUSS82. Thisconverted the lacUV5 promoter to a tac promoter in a constructdesignated pGEM-MB22, Table 112. ##STR10##

Promoter and RBS variants of the fusion protein gene were constructed bybasic DNA manipulation techniques to generate the following:

    ______________________________________                                               Promoter RBS    Encoded Protein.                                       ______________________________________                                        pGEM-MB16                                                                              lac        old    VIIIs.p.-BPTI-matureVIII                           pGEM-MB20                                                                              lac        new    "                                                  pGEM-MB22                                                                              tac        old    "                                                  pGEM-MB26                                                                              tac        new    "                                                  ______________________________________                                    

The synthetic gene from variants pGEM-MB20 and pGEM-MB26 were reclonedinto the altered phage vector M13-MB1/2 to generate the phage constructsdesignated M13-MB27 and M13-MB28 respectively.

iii. Signal Peptide Sequence

In vitro expression of the synthetic gene regulated by tac and the "new"RBS produced a novel protein of the expected size for the unprocessedprotein (about 16 kd). In vivo expression also produced novel protein offull size; no processed protein could be seen on phage or in cellextracts by silver staining or by Western analysis with anti-BPTIantibody.

Thus we analyzed the signal sequence of the fusion. Table 106 shows anumber of typical signal sequences. Charged residues are generallythought to be of great importance and are shown bold and underscored.Each signal sequence contains a long stretch of uncharged residues thatare mostly hydrophobic; these are shown in lower case. At the right, inparentheses, is the length of the stretch of uncharged residues. We notethat the fusions of gene VIII signal to BPTI and gene III signal to BPTIhave rather short uncharged segments. These short uncharged segments mayreduce or prevent processing of the fusion peptides. We know that thegene signal sequence is capable of directing: a) insertion of thepeptide comprising (mature-BPTI)::(mature-gene-III-protein) into thelipid bilayer, and b) translocation of BPTI and most of the mature geneIII protein across the lipid bilayer (vide infra). That the gene IIIremains anchored in the lipid bilayer until the phage is assembled isdirected by the uncharged anchor region near the carboxy terminus of themature gene III protein (see Table 116) and not by the secretion signalsequence. The phoA signal sequence can direct secretion of mature BPTIinto the periplasm of E. coli (MARK86). Furthermore, there iscontroversy over the mechanism by which mature authentic gene VIIIprotein comes to be in the lipid bilayer prior to phage assembly.

Thus we decided to replace the DNA coding on expression for thegene-VIII-putative-signal-sequence by each of: 1) DNA coding onexpression for the phoA signal sequence, 2) DNA coding on expression forthe bla signal sequence, or 3) DNA coding on expression for the M13 geneIII signal. Each of these replacements produces a tripartite geneencoding a fusion protein that comprises, in order: (a) a signal peptidethat directs secretion into the periplasm of parts (b) and (c), derivedfrom a first gene; (b) an initial potential binding domain (BPTI in thiscase), derived from a second gene (in this case, the second gene is ananimal gene); and (c) a structural packaging signal (the mature geneVIII coat protein), derived from a third gene.

The process by which the IPBD::packaging-signal fusion arrives on thephage surface is illustrated in FIG. 1. In FIG. 1a, we see thatauthentic gene VIII protein appears (by whatever process) in the lipidbilayer so that both the amino and carboxy termini are in the cytoplasm.Signal peptidase-I cleaves the gene VIII protein liberating the signalpeptide (that is absorbed by the cell) and mature gene VIII coat proteinthat spans the lipid bilayer. Many copies of mature gene VIII coatprotein accumulate in the lipid bilayer awaiting phage assembly (FIG.1c). Some signal sequences are able to direct the translocation of quitelarge proteins across the lipid bilayer. If additional codons areinserted after the codons that encode the cleavage site of the signalpeptidase-I of such a potent signal sequence, the encoded amino acidswill be translocated across the lipid bilayer as shown in FIG. 1b. Aftercleavage by signal peptidase-I, the amino acids encoded by the addedcodons will be in the periplasm but anchored to the lipid bilayer by themature gene VIII coat protein, FIG. 1d. The circular single-strandedphage DNA is extruded through a part of the lipid bilayer containing ahigh concentration of mature gene VIII coat protein; the carboxyterminus of each coat protein molecule packs near the DNA while theamino terminus packs on the outside. Because the fusion protein isidentical to mature gene VIII coat protein within the trans-bilayerdomain, the fusion protein will co-assemble with authentic mature geneVIII coat protein as shown in FIG. 1e.

In each case, the mature VIII coat protein moiety is intended toco-assemble with authentic mature VIII coat protein to produce phageparticle having BPTI domains displayed on the surface. The source andcharacter of the secretion signal sequence is not important because thesignal sequence is cut away and degraded. The structural packagingsignal, however, is quite important because it must co-assemble with theauthentic coat protein to make a working virus sheath.

a) Bacterial Alkaline Phosphatase (phoA) Signal Peptide

Construct pGEM-MB26 contains a fragment of DNA bounded by restrictionenzyme sites SacI and AccIII which contains the new RBS and sequencesencoding the initiating methionine and the signal peptide of M13 geneVIII pro-protein. This fragment was replaced with a synthetic duplex(constructed from four annealed oligonucleotides) containing the RBS andDNA coding for the initiating methionine and signal peptide of PhoA(INOU82). The resulting construct was designated pGEM-MB42; the sequenceof the fusion gene is shown in Table 113. M13MB48 is a derivative ofGemMB42. A BamHI-SalI DNA fragment from GemMB42, containing the geneconstruct, was ligated into a similarly cleaved vector M13MB1/2 givingrise to M13MB48. ##STR11##

b) beta-lactamase signal peptide

To enable the introduction of the beta-lactamase (amp) promoter and DNAcoding for the signal peptide into the gene encoding(mature-BPTI)::(mature-VIII-coat-protein) an initial manipulation of theamp gene (encoding beta-lactamase) was required. Starting with pGEM-3Zfan AccIII recognition site (TCCGGA) was introduced into the amp geneadjacent to the DNA sequence encoding the amino acids at thebeta-lactamase signal peptide cleavage site. Using standard methods ofin vitro site-directed oligonucleotide mutagenesis bases C₂₅₀₄ and A₂₅₀₁were converted to T and G respectively to generate the constructdesignated pGEM-MB40. Further manipulation of pGEM-MB40 entailed theinsertion of a synthetic oligonucleotide linker (CGGATCCG) containingthe BamHI recognition sequence (GGATCC) into the AatII site (GACGTCstarting at nucleotide number 2260) to generate the construct designatedpGEM-MB45. The DNA bounded by the restriction enzyme sites of BamHI andAccIII contains the amp promoter, amp RBS, initiating methionine andbeta-lactamase signal peptide. This fragment was used to replace thecorresponding fragment from pGEM-MB26 to generate construct pGEM-MB46.##STR12##

c) M13-gene-III-signal::bpti::mature-VIII-coat-protein

We may also construct, as depicted in FIG. 5, M13-MB51 which would carrya gene encoding a fusion of M13-gene-III-signal-peptide to thepreviously described BPTI::mature VIII coat protein. First the BstEIIsite that follows the stop codons of the synthetic gene VIII is changedto an AlwNI site as follows. DNA of pGEM-MB26 is cut with BstEII and theends filled in by use of Klenow enzyme; a blunt AlwNI linker is ligatedto this DNA. This construction is called pGEM-MB26Alw. The XhoI to AlwNIfragment (approximately 300 bp) of pGEM-MB26Alw is purified. RF DNA fromphage MK-BPTI (vide infra) is cut with AlwNI and XhoI and the largefragment purified. These two fragments are ligated together; theresulting construction is named M13-MB51. Because M13-MB51 contains nogene III, the phage can not form plaques. M13-MB51 can, however, rendercells Km^(R). Infectious phage particles can be obtained by use ofhelper phage. As explained below, the gene III signal sequence iscapable of directing (BPTI)::(mature-gene-III-protein) to the surface ofphage. In M13-MB51, we have inserted DNA encoding gene VIII coat protein(50 amino acids) and three stop codons 5' to the DNA encoding the maturegene III protein.

    ______________________________________                                                               Signal                                                         Promoter                                                                             RBS     sequence Fusion protein                                ______________________________________                                        pGEM-MB26                                                                                ##STR13##                                                                             new                                                                                    ##STR14##                                                                           BPTI/VIII-coat                              pGEM-MB42                                                                                ##STR15##                                                                             new                                                                                    ##STR16##                                                                           BPTI/VIII-coat                              pGEM-MB46                                                                                ##STR17##                                                                              ##STR18##                                                                             ##STR19##                                                                           BPTI/VIII-coat                              pGEM-MB51                                                                                ##STR20##                                                                              ##STR21##                                                                             ##STR22##                                                                           BPTI/VIII-coat                              M13 MB48                                                                                 ##STR23##                                                                             new                                                                                    ##STR24##                                                                           BPTI/VIII-coat                              ______________________________________                                    

2. Analysis of the Protein Products Encoded by the Synthetic(signal-peptide::mature-bpti::viii-coat-protein) Genes

i) In vitro analysis

A coupled transcription/translation prokaryotic system (Amersham Corp.,Arlington Heights, Ill.) was utilized for the in vitro analysis of theprotein products encoded by the BPTI/VIII synthetic gene and thevariants derived from this.

Table 107 lists the protein products encoded by the listed vectors whichare visualized by the standard method of fluorography following vitrosynthesis in the presence of ³⁵ S-methionine and separation of theproducts using SDS polyacrylamide gel electrophoresis. In each sample apre-beta-lactamase product (approximately 31 kd) can be seen. This isderived from the amo gene which is the common selection gene for each ofthe vectors. In addition, a (pre-BPTI/VIII) product encoded by thesynthetic gene and variants can be seen as indicated. The migration ofthese species (approximately 14.5 kd) is consistent with the expectedsize of the encoded proteins.

ii) In vivo analysis

The vectors detailed in sections (B) and (C) were freshly transfectedinto the E. coli strain XL1-blue™ (Stratagene, La Jolla, Calif.) and instrain SEF'. E. coli strain SE6004 (LISS85) carries the prlA4 mutationand is more permissive in secretion than strains that carry thewild-type prlA allele. SE6004 is F⁻ and is deleted for lacI; thus thecells can not be infected by M13 and lacUV5 and tac promoters can not beregulated with IPTG. Strain SEF' is derived from strain SE6004 (LISS85)by crossing with XL1-Blue™; the F' in XL1-Blue™ carries Tc^(R) andlacI^(q). SE6004 is streptomycin^(R), Tc^(S) while XL1-Blue™ isstreptomycin^(S), Tc^(R) so that both parental strains can be killedwith the combination of Tc and streptomycin. SEF' retains thesecretion-permissive phenotype of the parental strain, SE6004(prlA4).

The fresh transfectants were grown in NZYCM medium (SAMB89) for 1 hourafter which IPTG was added over the range of concentrations 1.0 μM to0.5 mM (to derepress the lacUV5 and tac promoters) and grown for anadditional 1.5 hours.

Aliquots of the bacterial cells expressing the synthetic insert encodedproteins together with the appropriate controls (no vector, vector withno insert and zero IPTG) were lysed in SDS gel loading buffer andelectrophoresed in 20% polyacrylamide gels containing SDS and urea.Duplicate gels were either silver stained (Daiichi, Tokyo, Japan) orelectrotransferred to a nylon matrix (Immobilon from Millipore, Bedford,Mass.) for western analysis by standard means using rabbit anti-BPTIpolyclonal antibodies.

Table 108 lists the interesting proteins visualized on a silver stainedgel and by western analysis of an identical gel. We can see clearly inthe western analysis that protein species containing BPTI epitopes arepresent in the test strains which are absent from the control strainsand which are also IPTG inducible. In XL1-Blue™, the migration of thisspecies is predominantly that of the unprocessed form of the pro-proteinalthough a small proportion of the encoded proteins appear to migrate ata size consistent with that of a fully processed form. In SEF', theprocessed form predominates, there being only a faint band correspondingto the unprocessed species.

Thus in strain SEF', we have produced a tripartite fusion protein thatis specifically cleaved after the secretion signal sequence. We believethat the mature protein comprises BPTI followed by the gene VIII coatprotein and that the coat protein moiety spans the membrane. We believethat it is highly likely that one or more copies, perhaps hundreds ofcopies, of this protein will co-assemble into M13 derived phage orM13-like phagemids. This construction will allow us to a) mutagenize theBPTI domain, b) display each of the variants on the coat of one or morephage (one type per phage), and c) recover those phage that displayvariants having novel binding properties with respect to targetmaterials of our choice.

Rasched and Oberer (RASC86) report that phage produced in cells thatexpress two alleles of gene VIII, that have differences within the first11 residues of the mature coat protein, contain some of each protein.Thus, because we have achieved in vivo processing of thephoA(signal)::bpti::matureVIII fusion gene, it is highly likely thatco-expression of this gene with wild-type VIII will lead to productionof phage bearing BPTI domains on their surface. Mutagenesis of the bptidomain of these genes will provide a population of phage, each phagecarrying a gene that codes for the variant of BPTI displayed on thephage surface.

VIII Display Phage: Production, Preparation and Analysis i. PhageProduction

The OCV can be grown in XL1-Blue™ in the absence of the inducing agent,IPTG. Typically, a plaque plug is taken from a plate and grown in 2 mlof medium, containing freshly diluted bacterial cells, for 6 to 8 hours.Following centrifugation of this culture the supernatant is taken andthe phage titer determined. This is kept as a phage stock for furtherinfection, phage production and display of the gene product of interest.

A 100 fold dilution of a fresh overnight culture of SEF' bacterial cellsin 500 ml of NZCYM medium is allowed to grow to a cell density of 0.4(Ab 600nm) in a shaker incubator at 37° C. To this culture is added asufficient amount of the phage stock to give a MOI of 10 together withIPTG to give a final concentration of 0.5 mM. The culture is allowed togrow for a further 2 hrs.

ii. Phage Preparation and Purification

The phage producing bacterial culture is centrifuged to separate thephage in the supernatant from the bacterial pellet. To the supernatantis added one quarter by volume of phage precipitation solution (20% PEG,3.75 M ammonium acetate) and PMSF to a final concentration of 1 mM. Itis left on ice for 2 hours after which the precipitated phage isretrieved by centrifugation. The phage pellet is redissolved in TrisEDTAcontaining 0.1% Sarkosyl and left at 4° C. for 1 hour after which anybacteria and bacterial debris is removed by centrifugation. The phage inthe supernatant is reprecipitated with PEG overnight at 4° C. The phagepellet is resuspended in LB medium and reprecipitated another two timesto remove the detergent. The phage is stored in LB medium at 4° C.,titered and used for analysis and binding studies.

A more stringent phage purification scheme involves centrifugation in aCsCl gradient. 3.86 g of CsCl is dissolved in NET buffer (0.1 M NaCl, 1mM EDTA, 0.1 M Tris pH 7.7) up to a volume of 10 ml. 10¹² to 10¹³ phagein TE Sarkosyl buffer are mixed with 5 ml of CsCl NET buffer andtransferred to a sealable ultracentrifuge tube. Centrifugation isperformed overnight at 34 K rpm in a Sorvall OTD-65B Ultracentrifuge.The tubes are opened and 400 μl aliquots are carefully removed. 5 μlaliquots are removed from the fractions and analysed by agarose gelelectrophoresis after heating at 65° C. for 15 minutes together with thegel loading buffer containing 0.1% SDS. Fractions containing phage arepooled, the phage reprecipitated and finally redissolved in LB medium toa concentration of 10¹² to 10¹³ phage per ml.

iii. Phage Analysis

The display phage, together with appropriate controls are analyzed usingstandard methods of polyacrylamide gel electrophoresis and either silverstaining of the gel or electrotransfer to a nylon matrix followed byanalysis with anti-BPTI antiserum (Western analysis). Quantitation ofthe display of heterologous proteins is achieved by running a serialdilution of the starting protein, for example BPTI, together with thedisplay phage samples in the electrophoresis and Western analysesdescribed above. An alternative method involves running a 2 fold serialdilution of a phage in which both the major coat protein and the fusionprotein are visualized by silver staining. A comparison of the relativeratios of the two protein species allows one to estimate the number offusion proteins per phage since the number of VIII gene encoded proteinsper phage (approximately 3000) is known.

Incorporation of fusion protein into bacteriophage

In vivo expression of the processed BPTI:VIII fusion protein, encoded byvectors GemMB42 (above and Table 113) and M13MB48 (above), implied thatthe processed fusion product was likely to be correctly located withinthe bacterial cell membrane. This localization made it possible that itcould be incorporated into the phage and that the BPTI moiety would bedisplayed at the bacteriophage surface.

SEF' cells were infected with either M13MB48 (consisting of the startingphage vector M13mp18, altered as described above, containing thesynthetic gene consisting of a tac promoter, functional ribosome bindingsite, phoA signal peptide, mature BPTI and mature major coat protein) orM13mp18, as a control. Phage infections, preparation and purificationwas performed as described in Example VIII.

The resulting phage were electrophoresed (approximately 10¹¹ phage perlane) in a 20% polyacrylamide gel containing urea followed byelectrotransfer to a nylon matrix and western analysis using anti-BPTIrabbit serum. A single species of protein was observed in phage derivedfrom infection with the M13MB48 stock phage which was not observed inthe control infection. This protein had a migration of about 12 kd,consistent with that of the fully processed fusion protein.

Western analysis of SEF' bacterial lysate with or without phageinfection demonstrate another species of protein of about 20kd. Thisspecies was also present, to a lesser degree, in phage preparationswhich were simply PEG precipitated without further purification (forexample, using nonionic detergent or by CsCl gradient centrifugation). Acomparison of M13MB48 phage progoff eparations made in the presence orabsence of detergent aldemonstrated that sarkosyl treatment and CsClgradient purification did remove the bacterial contaminant while havingno effect on the presence of the BPTI:VIII fusion protein. Thisindicates that the fusion protein has been incorporated and is aconstituent of the phage body.

The time course of phage production and BPTI:VIII incorporation wasfollowed post-infection and after IPTG induction. Phage production andfusion protein incorporation appeared to be maximal after two hours.This time course was utilized in further phage productions and analyses.

Polyacrylamide electrophoresis of the phage preparations, followed bysilver staining, demonstrated that the preparations were essentiallyfree of contaminating protein species and that an extra protein band waspresent in M13MB48 derived phage which was not present in the controlphage. The size of the new protein was consistent with that seen bywestern analysis. A similar analysis of a serially diluted BPTI:VIIIincorporated phage demonstrated that the ratio of fusion protein tomajor coat protein was typically in the range of 1:150. Since the phageis known to contain in the order of 3000 copies of the gene VIIIproduct, this means that the phage population contains, on average, 10'sof copies of the fusion protein per phage.

Altering the initiating methionine of the natural gene VIII

The OCV M13MB48 contains the synthetic gene encoding the BPTI:VIIIfusion protein in the intergenic region of the modified M13mp18 phagevector. The remainder of the vector consists of the M13 genome whichcontains the genes necessary for various bacteriophage functions, suchas DNA replication and phage formation etc. In an attempt to increasethe phage incorporation of the fusion protein, we decided to try todiminish the production of the natural gene VIII product, the major coatprotein, by altering the codon for the initiating methionine of thisgene to one encoding leucine. In such cases, methionine is actuallyincorporated, but the rate of initiation is reduced. The change wasachieved by standard methods of site-specific oligonucleotidemutagenesis as follows. ##STR25##

Note that the 3' end of the XI gene overlaps with the 5 ' end of theVIII gene. Changes in DNA sequence were designed such that the desiredchange in the VIII gene product could be achieved without alterations tothe predicted amino acid sequence of the gene XI product. A diagnosticPvuII recognition site was introduced at this site.

It was anticipated that initiation of the natural gene VIII productwould be hindered, enabling a higher proportion of the fusion protein tobe incorporated into the resulting phage.

Analyses of the phage derived from this modified vector indicated thatthere was a significant increase in the ratio of fusion protein to majorcoat protein. Quantitative estimates indicated that within a phagepopulation as much as 100 copies of the BPTI:VIII fusion wereincorporated per phage.

Incorporation of interdomain extension fusion proteins into phage

A phage pool containing a variegated pentapeptide extension at theBPTI:coat protein interface (see Example VII) was used to infect SEF'cells. IPTG induction, phage production and preparation were asdescribed in Example VIII. Using the criteria detailed in the previoussection, it was determined that extended fusion proteins wereincorporated into phage. Gel electrophoresis of the generated phage,followed by either silver staining or western analysis with anti-BPTIrabbit serum, demonstrated fusion proteins that migrated similarly tobut discernably slower that of the starting fusion protein.

With regard to the `EGGGS linker` [SEQ ID NO:10] extensions of thedomain interface, individual phage stocks predicted to contain one ormore 5-amino-acid unit extensions were analyzed in a similar fashion.The migration of the extended fusion proteins were readilydistinguishable from the parent fusion protein when viewed by westernanalysis or silver staining. Those clones analyzed in more detailincluded M13.3X4 (which contains a single inverted EGGGS linker with apredicted amino acid sequence of GGGSL) (SEQ ID NO:16), M13.3X7 (whichcontains a correctly orientated linker with predicted amino acidsequence of EGGGS) (SEQ ID NO:10), M13.3X11 (which contains 3 linkerswith an inversion and a predicted amino acid sequence for the extensionof EGGGSGSSSLGSSSL) (SFQ ID NO:11) and M13.3Xd which contains anextension consisting of at least 5 linkers or 25 amino acids.

The extended fusion proteins were all incorporated into phage at highlevels (on average 10's of copies per phage were present and whenanalyzed by gel electrophoresis migrated rates consistent with thepredicted size of the extension. Clones M13.3X4 and M13.3X7 migrated ata position very similar to but discernably different from the parentfusion protein, while M13.3X11 and M13.3Xd were markedly larger.

Display of BPTI:VIII fusion protein by bacteriophage

The BPTI:VIII fusion protein had been shown to be incorporated into thebody of the phage. This phage was analyzed further to demonstrate thatthe BPTI moiety was accessible to specific antibodies and hencedisplayed at the phage surface.

The assay is detailed in Example II but principally involves theaddition of purified anti-BPTI IgG (from the serum of BPTI injectedrabbits) to a known titer of phage. Following incubation, proteinA-agarose beads are added to bind the IgG and left to incubateovernight. The IgG-protein A beads and any bound phage are removed bycentrifugation followed by a retitering of the supernatant to determineany loss of phage. The phage bound to the beads can be acid eluted andtitered also. Appropriate controls are included in the assay, such as awild type phage stock (M13mp18) and IgG purified from normal rabbitpre-immune serum.

Table 140 shows that while the titer of the wild type phage is unalteredby the presence of anti-BPTI IgG, BPTI-IIIMK (the positive control forthe assay), demonstrated a significant drop in titer with or without theextra addition of protein A beads. (Note that since the BPTI moiety ispart of the III gene product which is involved in the binding of phageto bacterial pili, such a phenomenon is entirely expected.) Two batchesof M13MB48 phage (containing the BPTI:VIII fusion protein) demonstrateda significant reduction in titer, as judged by plaque forming units,when anti-BPTI antibodies and protein A beads were added to the phage.The initial drop in titer with the antibody alone, differs somewhatbetween the two batches of phage. This may be a result of experimentalor batch variation. Retrieval of the immunoprecipitated phage, while notquantitative, was significant when compared to the wild type phagecontrol.

Further control experiments relating to this section are shown in Table141 and Table 142. The data demonstrated that the loss in titerobserve`for the BPTI:VIII containing phage is a result o the display ofBPTI epitopes by these phage and the specific interaction with anti-BPTIantibodies. No significant interaction with either protein A agarosebeads or IgG purified from normal rabbit serum could be demonstrated.The larger drop in titer for M13MB48 batch five reflects the higherlevel incorporation of the fusion protein in this preparation.

Functionality of the BPTI moiety in the BPTI-VIII display phage

The previous two sections demonstrated that the BPTI:VIII fusion proteinhas been incorporated into the phage body and that the BPTI moiety isdisplayed at the phage surface. To demonstrate that the displayedmolecule is functional, binding experiments were performed in a manneralmost identical to that described in the previous section except thatproteases were used in place of antibodies. The display phage, togetherwith appropriate controls, are allowed to interact with immobilizedproteases or immobilized inactivated proteases. Binding can be assessedby monitoring the loss in titer of the display phage or by determiningthe number of phage bound to the respective beads.

Table 143 shows the results of an experiment in which BPTI.VIII displayphage, M13MB48, were allowed to bind to anhydrotrypsin-agarose beads.There was a significant drop in titer when compared to wild type phage,which do not display BPTI. A pool of phage (5AA Pool), each contain avariegated 5 amino acid extension at the BPTI:major coat proteininterface, demonstrated a similar decline in titer. In a controlexperiment (table 143) very little non-specific binding of the abovedisplay phage was observed with agarose beads to which an unrelatedprotein (streptavidin) is attached.

Actual binding of the display phage is demonstrated by the data shownfor two experiments in Table 144. The negative control is wild typeM13mp18 and the positive control is BPTI-IIIMK, a phage in which theBPTI moiety, attached to the gene III protein, has been shown to bedisplayed and functional. M13MB48 and M13MB56 both bind toanhydrotrypsin beads in a manner comparable to that of the positivecontrol, being 40 to 60 times better than the negative control(non-display phage). Hence functionality of the BPTI moiety, in themajor coat fusion protein, was established.

To take this analysis one step further, a comparison of phage binding toactive and inactivated trypsin is shown in Table 145. The control phage,M13mp18 and BPTI-III MK, demonstrated binding similar to that detailedin Example III. Note that the relative binding is enhanced with trypsindue to the apparent marked reduction in the non-specific binding of thewild type phage to the active protease. M13.3X7 and M13.3X11, which bothcontain `EGGGS` linker extensions at the domain interface, bound toanhydrotrypsin and trypsin in a manner similar to BPTI-IIIMK phage. Thebinding, relative to non-display phage, was approximately 100 foldhigher in the anhydrotrypsin binding assay and at least 1000 fold higherin the trypsin binding assay. The binding of another `EGGGS` linkervariant (M13.3Xd) was similar to that of M13.3X7.

To demonstrate the specificity of binding the assays were repeated withhuman neutrophil elastase (HNE) beads and compared to that seen withtrypsin beads Table 146. BPTI has a very high affinity for trypsin and alow affinity for HNE, hence the BPTI display phage should reflect theseaffinities when used in binding assays with these beads. The negativeand positive controls for trypsin binding were as already describedabove while an additional positive control for the HNE beads,BPTI(K15L,MGNG)-III MA (see Example III) was included. The results,shown in Table 146, confirmed this prediction. M13MB48, M13.3X7 andM13.3X11 phage demonstrated good binding to trypsin, relative to wildtype phage and the HNE control (BPTI(K15L,MGNG)-III MA). (The amino acidsequence MGNG has SEQ ID NO:12; BPTI (. . . ,MGNG) denotes a homologueof BPTI having M₃₉, M₄₀, N₄₁, G₄₂ where . . . may indicate otheralterations.) being comparable to BPTI-IIIMK phage. Conversely poorbinding occurred when HNE beads were used, with the exception of the HNEpositive control phage.

Taken together the accumulated data demonstrated that when BPTI is partof a fusion protein with the major coat protein of M13 phage, themolecule is both displayed at the surface of the phage and a significantproportion of it is functional in a specific protease binding manner.

EXAMPLE II CONSTRUCTION OF BPTI/GENE-III DISPLAY VECTOR

DNA manipulations were conducted according to standard procedures asdescribed in Maniatis et al. (MANI82). First the unwanted lacZ gene ofM13-MB1/2 was removed. M13-MB1/2 RF was cut with BamHI and SalI and thelarge fragment was isolated by agarose gel electrophoresis. Therecovered 6819 bp fragment was filled in with Klenow fragment of E. coliDNA polymerase and ligated to a synthetic HindIII 8mer linker(CAAGCTTG). The ligation sample was used to transfect competentXL1-Blue™ (Stratagene, La Jolla, Calif.) cells which were subsequentlyplated for plaque formation. RF DNA was prepared from chosen plaques anda clone, M13-MB1/2-delta, containing regenerated BamHI and SalI sites aswell as a new HindIII site, all 500 bp upstream of the BglII site (6935)was picked.

A unique NarI site was introduced into codons 17 and 18 of gene III(changing the amino acids from H-S to G-A, Cf. Table 110). 10⁶ phageproduced from bacterial cells harboring the M13-MB1/2-delta RF DNA wereused to infect a culture of CJ236 cells (relevant genotype: F', dut1,ung1, Cm^(R)) (OD595=0.35). Following overnight incubation at 37° C.,phage were recovered and uracil-containing ss DNA was extracted fromphage in accord with the instructions for the MUTA-GENE™ M13 in vitroMutagenesis Kit (Catalogue Number 170-3571, Bio-Rad, Richmond, Calif.).Two hundred nanograms of the purified single stranded DNA was annealedto 3 picomoles of a phosphorylated 25 mer mutagenic oligonucleotide,

    5`-gtttcagcggCgCCagaatagaaag-3`,

where upper case indicates the changes). Following filling in with T4DNA polymerase and ligation with T4 DNA ligase, the reaction sample wasused to transfect competent XL1-Blue(™) cells which were subsequentlyplated to permit the formation of plaques.

RF DNA, isolated from phage-infected cells which had been allowed topropagate in liquid culture for 8 hours, was denatured, spotted on aNytran membrane, baked and hybridized to the 25 mer mutagenicoligonucleotide which had previously been phosphorylated with ³² P-ATP.Clones exhibiting strong hybridization signals at 70° C. (6° C. lessthan the theoretical Tm of the mutagenic oligonucleotide) were chosenfor large scale RF preparation. The presence of a unique NarI site atnucleotide 1630 was confirmed by restriction enzyme analysis. Theresultant RF DNA, M13-MB1/2-delta-NarI was cut with BamHI,dephosphorylated with calf intestinal phosphatase, and ligated to a 1.3Kb BamHI fragment, encoding the kanamycin-resistance gene (kan), derivedfrom plasmid pUC4K (Pharmacia, Piscataway, N.J.). The ligation samplewas used to transfect competent XL1-Blue(™) cells which weresubsequently plated onto LB plates containing kanamycin (Km). RF DNAprepared from Km^(R) colonies was prepared and subjected to restrictionenzyme analysis to confirm the insertion of kan intoM13-MB1/2-delta-NarI DNA thereby creating the phage MK. Phage MK growsas well as wild-type M13, indicating that the changes at the cleavagesite of gene III protein are not detectably deleterious to the phage.

INSERTION OF SYNTHETIC BPTI GENE

The construction of the BPTI-III expression vector is shown in FIG. 6.The synthetic bpti-VIII fusion contains a NarI site that comprises thelast two codons of the BPTI-encoding region. A second NarI site wasintroduced upstream of the BPTI-encoding region as follows. RF DNA ofphage M13-MB26 was cut with AccIII and ligated to the dsDNA adaptor:##STR26## The ligation sample was subsequently restricted with NarI anda 180 bp DNA fragment encoding BPTI was isolated by agarose gelelectrophoresis. RF DNA of phage MK was digested with NarI,dephosphorylated with calf intestinal phosphatase and ligated to the 180bp fragment. Ligation samples were used to transfect competentXL1-Blue(™) cells which were plated to enable the formation of plaques.DNA, isolated from phage derived from plaques, was denatured, applied toa Nytran membrane, baked and hybridized to a ³² P-phosphorylated doublestranded DNA probe corresponding to the BPTI gene. Large scale RFpreparations were made for clones exhibiting a strong hybridizationsignal. Restriction enzyme digestion analysis confirmed the insertion ofa single copy of the synthetic BPTI gene into gene III of MK to generatephage MK-BPTI. Subsequent DNA sequencing confirmed that the sequence ofthe bpti-III fusion gene is correct and that the correct reading frameis maintained (Table 111). Table 116 shows the entire coding region, thetranslation into protein sequence, and the functional parts of thepolypeptide chain.

EXPRESSION OF THE BPTI-III FUSION GENE IN VITRO

MK-BPTI RF DNA was added to a coupled prokaryotictranscription-translation extract (Amersham). Newly synthesizedradiolabelled proteins were produced and subsequently separated byelectrophoresis on a 15% SDS-polyacrylamide gel subjected tofluorography. The MK-BPTI DNA directs the synthesis of an unprocessedgene III fusion protein which is 7 Kd larger than the gene III productencoded by MK. This is consistent with the insertion of 58 amino acidsof BPTI into the gene III protein. Immunoprecipitation of radiolabelledproteins generated by the cell-free prokaryotic extract was conducted.Neither rabbit anti(M13-gene-VIII-protein) IgG nor normal rabbit IgGwere able to immunoprecipitate the gene III protein encoded by either MKor MK-BPTI. However, rabbit anti-BPTI IgG is able to immunoprecipitatethe gene III protein encoded by MK-BPTI but not by MK. This confirmsthat the increase in size of the III protein encoded by MK-BPTI isattributable to the insertion of the BPTI protein.

WESTERN ANALYSIS

Phage were recovered from bacterial cultures by PEG precipitation. Toremove residual bacterial cells, recovered phage were resuspended in ahigh salt buffer and subjected to centrifugation, in accord with theinstructions for the MUTA-GENE(®) M13 in vitro Mutagenesis Kit(Catalogue Number 170-3571, Bio-Rad, Richmond, Calif.). Aliquots ofphage (containing up to 40 μg of protein) were subjected toelectrophoresis on a 12.5% SDS-urea-polyacrylamide gel and proteins weretransferred to a sheet of Immobilon by electro-transfer. Western blotswere developed using rabbit anti-BPTI serum, which had previously beenincubated with an E. coli extract, followed by goat ant-rabbit antibodyconjugated to alkaline phosphatase. An immunoreactive protein of 67 Kdis detected in preparations of the MK-BPTI but not the MK phage. Thesize of the immunoreactive protein is consistent with the predicted sizeof a processed BPTI-III fusion protein (6.4 Kd plus 60 Kd). These dataindicate that BPTI-specific epitopes are presented on the surface of theMK-BPTI phage but not the MK phage.

NEUTRALIZATION OF PHAGE TITER WITH AGAROSE-IMMOBILIZED ANHYDRO-TRYPSIN

Anhydro-trypsin is a derivative of trypsin in which the active siteserine has been converted to dehydroalanine. Anhydro-trypsin retains thespecific binding of trypsin but not the protease activity. Unlikepolyclonalantibodies, anhydro-trypsin is not expected to bind unfoldedBPTI or incomplete fragments.

Phage MK-BPTI and MK were diluted to a concentration 1.4·¹² particlesper ml. in TBS buffer (PARM88) containing 1.0 mg/ml BSA. Thirtymicroliters of diluted phage were added to 2, 5, or 10 microliters of a50% slurry of agarose-immobilized anhydro-trypsin (Pierce Chemical Co.,Rockford, Ill.) in TBS/BSA buffer. Following incubation at 25° C.,aliquots were removed, diluted in ice cold LB broth and titered forplaque-forming units on a lawn of XL1-Blue (™) cells. Table 114illustrates that incubation of the MK-BPTI phage with immobilizedanhydro-trypsin results in a very significant loss in titer over a fourhour period while no such effect is observed with the MK (control)phage. The reduction in phage titer is also proportional to the amountof immobilized anhydro-trypsin added to the MK-BPTI phage. Incubationwith five microliters of a 50% slurry of agarose-immobilizedstreptavidin (Sigman, St. Louis. Mont.) in TBS/BSA buffer does notreduce the titer of either the MK-BPTI or MK phage. These data areconsistent with the presentation of a correctly-folded, functional BPTIprotein on the surface of the MK-BPTI phage but not on the MK phage.Unfolded or incomplete BPTI domains are not expected to bindanhydrotrypsin. Furthermore, unfolded BPTI domains are expected to benon-specifically sticky.

NEUTRALIZATION OF PHAGE TITER WITH ANTI-BPTI ANTIBODY

MK-BPTI and MK phage were diluted to a concentration of 4·10⁸plaque-forming units per ml in LB broth. Fifteen microliters of dilutedphage were added to an equivalent volume of either rabbit anti-BPTIserum or normal rabbit serum (both diluted 10 fold in LB broth).Following incubation at 37° C., aliquots were removed, diluted by 10⁴ inice-cold LB broth and titered for plaque-forming units on a lawn ofXL1-Blue(™) cells. Incubation of the MK-BPTI phage with anti-BPTI serumresults in a steady loss in titer over a two hour period while no sucheffect is observed with the MK phage. As expected, normal rabbit serumdoes not reduce the titer of either the MK-BPTI or the MK phage. Priorincubation of the anti-BPTI serum with authentic BPTI protein but notwith an equivalent amount of E. coli protein, blocks the ability of theserum to reduce the titer of the MK-BPTI phage. This data is consistentwith the presentation of BPTI-specific epitopes on the surface of theMK-BPTI phage but not the MK phage. More specifically, the dataindicates that these BPTI epitopes are associated with the gene IIIprotein and that association of this fusion protein with an anti-BPTIantibody blocks its ability to mediate the infection of bacterial cells.

NEUTRALIZATION OF PHAGE TITER WITH TRYPSIN

MK-BPTI and MK phage were diluted to a concentration of 4·10⁸plaque-forming units per ml in LB broth. Diluted phage were added to anequivalent volume of trypsin diluted to various concentrations in LBbroth. Following incubation at 37° C., aliquots were removed, diluted by10⁴ in ice cold LB broth and titered for plaque-forming units on a lawnof XL1-Blue(™) cells. Incubation of the MK-BPTI phage with 0.15 μg oftrypsin results in a 70% loss in titer after a two hour period whileonly a 15% loss in titer is observed for the MK phage. A reduction inthe amount of trypsin added to phage results in a reduction in the lossof titer. However, at all trypsin concentrations investigated, theMK-BPTI phage are more sensitive to incubation with trypsin than the MKphage. An interpretation of this data is that association of theBPTI-III fusion protein displayed on the surface of the MK-BPTI phagewith trypsin blocks its ability to mediate the infection of bacterialcells.

The reduction in titer of phage MK by trypsin is an example of aphenomenon that is likely to be general: proteases, if present insufficient quantity, will degrade proteins on the phage and reduceinfectivity. The present application lists several means that can beused to overcome this problem.

AFFINITY SELECTION SYSTEM Affinity Selection with ImmobilizedAnhydro-Trypsin

MK-BPTI and MK phage were diluted to a concentration of 1.4·10¹²particles per ml in TBS buffer (PARM88) containing 1.0 mg/ml BSA. Weadded 4.0·10¹⁰ phage to 5 microliters of a 50% slurry of eitheragarose-immobilized anhydro-trypsin beads (Pierce Chemical Co.) oragarose-immobilized streptavidin beads (Sigma) in TBS/BSA. Following a 3hour incubation at room temperature, the beads were pelleted bycentrifugation for 30 seconds at 5000 rpm in a microfuge and thesupernatant fraction was collected. The beads were washed 5 times withTBS/Tween buffer (PARM88) and after each wash the beads were pelleted bycentrifugation and the supernatant was removed. Finally, beads wereresuspended in elution buffer (0.1 N HCl containing 1.0 mg/ml BSAadjusted to pH 2.2 with glycine) and following a 5 minute incubation atroom temperature, the beads were pelleted by centrifugation. Thesupernatant was removed and neutralized by the addition of 1.0 MTris-HCl buffer, pH 8.0.

Aliquots of phage samples were applied to a Nytran membrane using aSchleicher and Schuell (Keene, N.H.) filtration minifold and phage DNAwas immobilized onto the Nytran by baking at 80° C. for 2 hours. Thebaked filter was incubated at 42° C. for 1 hour in pre-wash solution(MANI82) and pre-hybridization solution (5Prime-3Prime, West Chester,Pa.). The 1.0 Kb NarI (base 1630)/XmnI (base 2646) DNA fragment from MKRF was radioactively labelled with ³² P-dCTP using an oligolabelling kit(Pharmacia, Piscataway, N.J.). The radioactive probe was added to theNytran filter in hybridization solution (5Prime-3Prime) and, followingovernight incubation at 42° C., the filter was washed and subjected toautoradiography.

The efficiency of this affinity selection system can besemi-quantitatively determined using the dot-blot procedure describedelsewhere in the present application. Exposure of MK-BPTI-phage-treatedanhydro-trypsin beads to elution buffer releases bound MK-BPTI phage.Streptavidin beads do not retain phage MK-BPTI. Anhydro-trypsin beads donot retain phage MK. In the experiment depicted in Table 115, weestimate that 20% of the total MK-BPTI phage were bound to 5 microlitersof the immobilized anhydrotrypsin and were subsequently recovered bywashing the beads with elution buffer (pH 2.2 HCl/glycine). Under thesame conditions, no detectable MK-BPTI phage were bound and subsequentlyrecovered from the streptavidin beads. The amount of MK-BPTI phagerecovered in the elution fraction is proportional to the amount ofimmobilized anhydro-trypsin added to the phage. No detectable MK phagewere bound to either the immobilized anhydrotrypsin or streptavidinbeads and no phage were recovered with elution buffer. These dataindicate that the affinity selection system described above can beutilized to select for phage displaying a specific folded protein (inthis case, BPTI). Unfolded or incomplete BPTI domains are not expectedto bind anhydro-trypsin.

Affinity Selection with Anti-BPTI antibodies

MK-BPTI and MK phage were diluted to a concentration of 1·10¹⁰ particlesper ml in Tris buffered saline solution (PARM88) containing 1.0 mg/mlBSA. Two·10⁸ phage were added to 2.5 μg of either biotinylated rabbitanti-BPTI IgG in TBS/BSA or biotinylated rabbit anti-mouse antibody IgG(Sigma) in TBS/BSA, and incubated overnight at 4° C. A 50% slurry ofstreptavidin-agarose (Sigma), washed three times with TBS buffer priorto incubation with 30 mg/ml BSA in TBS buffer for 60 minutes at roomtemperature, was washed three times with TBS/Tween buffer (PARM88) andresuspended to a final concentration of 50% in this buffer. Samplescontaining phage and biotinylated IgG were diluted with TBS/Tween priorto the addition of streptavidin-agarose in TBS/Tween buffer. Following a60 minute incubation at room temperature, streptavidin-agarose beadswere pelleted by centrifugation for 30 seconds and the supernatantfraction was collected. The beads were washed 5 times with TBS/Tweenbuffer and after each wash, the beads were pelleted by centrifugationand the supernatant was removed. Finally, the streptavidinagarose beadswere resuspended in elution buffer (0.1 N HCl containing 1.0 mg/ml BSAadjusted to pH 2.2 with glycine), incubated 5 minute at roomtemperature, and pelleted by centrifugation. The supernatant was removedand neutralized by the addition of 1.0 M Tris-HCl buffer, pH 8.0.

Aliquots of phage samples were applied to a Nytran membrane using aSchleicker and Schuell minifold apparatus. Phage DNA was immobilizedonto the Nytran by baking at 80° C. for 2 hours. Filters were washed for60 minutes in pre-wash solution (MANI82) at 42° C. then incubated at 42°C. for 60 minutes in Southern pre-hybridization solution(5Prime-3Prime). The 1.0 Kb NarI (1630bp)/XmnI (2646 bp) DNA fragmentfrom MK RF was radioactively labelled with ³² P-αdCTP using anoligolabelling kit (Pharmacia, Piscataway, N.J.). Nytran membranes weretransferred from pre-hybridization solution to Southern hybridizationsolution (5Prime-3Prime) at 42° C. The radioactive probe was added tothe hybridization solution and following overnight incubation at 42° C.,the filter was washed 3 times with 2×SSC, 0.1% SDS at room temperatureand once at 65° C. in 2×SSC, 0.1% SDS. Nytran membranes were subjectedto autoradiography. The efficiency of the affinity selection system canbe semi-quantitatively determined using the above dot blot procedure.Comparison of dots A1 and B1 or C1 and D1 indicates that the majority ofphage did not stick to the streptavidin-agarose beads. Washing withTBS/Tween buffer removes the majority of phage which arenon-specifically associated with streptavidin beads. Exposure of thestreptavidin beads to elution buffer releases bound phage only in thecase of MK-BPTI phage which have previously been incubated withbiotinylated rabbit anti-BPTI IgG. This data indicates that the affinityselection system described above can be utilized to select for phagedisplaying a specific antigen (in this case BPTI). We estimate anenrichment factor of at least 40 fold based on the calculation ##EQU2##

EXAMPLE III CHARACTERIZATION AND FRACTIONATION OF CLONALLY PUREPOPULATIONS OF PHAGE, EACH DISPLAYING A SINGLE CHIMERIC APROTININHOMOLOGUE/M13 GENE III PROTEIN

This Example demonstrates that chimeric phage proteins displaying atarget-binding domain can be eluted from immobilized target bydecreasing pH, and the pH at which the protein is eluted is dependent onthe binding affinity of the domain for the target.

Standard Procedures

Unless otherwise noted, all manipulations were carried out at roomtemperature. Unless otherwise noted, all cells are XL1-Blue(™)(Stratagene, La Jolla, Calif.).

1) Demonstration of the Binding of BPTI-III MK Phage to Active TRYPSINBeads

Previous experiments designed to verify that BPTI displayed by fusionphage is functional relied on the use of immobilized anhydro-trypsin, acatalytically inactive form of trypsin. Although anhydro-trypsin isessentially identical to trypsin structurally (HUBE75, YOKO77) and inbinding properties (VINC74, AKOH72), we demonstrated that BPTI-IIIfusion phage also bind immobilized active trypsin. Demonstration of thebinding of fusion phage to immobilized active protease and subsequentrecovery of infectious phage facilitates subsequent experiments wherethe preparation of inactive forms of serine proteases by proteinmodification is laborious or not feasible.

Fifty μ1 of BPTI-III MK phage (identified as MK-BPTI in U.S. Ser. No.07/487,063) Example 17 (3.7·10¹¹ pfu/ml) in either 50 mM Tris, pH 7.5,150 mM NaCl, 1.0 mg/ml BSA (TBS/BSA) buffer or 50 mM sodium citrate, pH6.5, 150 mM NaCl, 1.0 mg/ml BSA (CBS/BSA) buffer were added to 10 μ1 ofa 25% slurry of immobilized trypsin (Pierce Chemical Co., Rockford,Ill.) also in TBS/BSA or CBS/BSA. As a control, 50 μ1 MK phage (9.3·10¹²pfu/ml) were added to 10 μl of a 25% slurry of immobilized trypsin ineither TBS/BSA or CBS/BSA buffer. The infectivity of BPTI-III MK phageis 25-fold lower than that of MK phage; thus the conditions chosen aboveensure that an approximately equivalent number of phageparticles areadded to the trypsin beads. After 3 hours of mixing on a Labquake shaker(Labindustries Inc., Berkeley, Calif.) 0.5 ml of either TBS/BSA orCBS/BSA was added where appropriate to the samples. Beads were washedfor 5 min and recovered by centrifugation for 30 sec. The supernatantwas removed and 0.5 ml of TBS/0.1% Tween-20 was added. The beads weremixed for 5 minutes on the shaker and recovered by centrifugation asabove. The supernatant was removed and the beads were washed anadditional five times with TBS/0.1% Tween-20 as described above.Finally, the beads were resuspended in 0.5 ml of elution buffer (0.1 MHCl containing 1.0 mg/ml BSA adjusted to pH 2.2 with glycine), mixed for5 minutes and recovered by centrifugation. The supernatant fraction wasremoved and neutralized by the addition of 130 μl of 1 M Tris, pH 8.0.Aliquots of the neutralized elution sample were diluted in LB broth andtitered for plaque-forming units on a lawn of cells.

Table 201 illustrates that a significant percentage of the inputBPTI-III MK phage bound to immobilized trypsin and was recovered bywashing with elution buffer. The amount of fusion phage which bound tothe beads was greater in TBS buffer (pH 7.5) than in CBS buffer (pH6.5). This is consistent with the observation that the affinity of BPTIfor trypsin is greater at pH 7.5 than at pH 6.5 (VINC72, VINC74). A muchlower percentage of the MK control phage (which do not display BPTI)bound to immobilized trypsin and this binding was independent of the pHconditions. At pH 6.5, 1675 times more of the BPTI-III MK phage than ofthe MK phage bound to trypsin beads while at pH 7.5, a 2103-folddifference was observed. Hence fusion phage displaying BPTI adhere notonly to anhydro-trypsin beads but also to active trypsin beads and canbe recovered as infectious phage. These data, in conjunction withearlier findings, strongly suggest that BPTI displayed on the surface offusion phage is appropriately folded and functional.

2) Generation of PI Mutants of BPTI

To demonstrate the specificity of interaction of BPTI-III fusion phagewith immobilized serine proteases, single amino acid substitutions wereintroduced at the Pl position (residue 15 of mature BPTI) of theBPTI-III fusion protein by site-directed mutagenesis. A 25mer mutagenicoligonucleotide (P1) was designed to substitute a LEU codon for theLYS₁₅ codon. This alteration is desired because BPTI(K15L) is amoderately good inhibitor of human neutrophil elastase (HNE)(K_(d)=2.9·10⁻⁹ M) (BECK88b) and a poor inhibitor of trypsin. A fusion phagedisplaying BPTI(K15L) should bind to immobilized HNE but not toimmobilized trypsin. BPTI-III MK fusion phage would be expected todisplay the opposite phenotype (bind to trypsin, fail to bind to HNE).These observations would illustrate the binding specificity of BPTI-IIIfusion phage for immobilized serine proteases.

Mutagenesis of the P1 region of the BPTI-VIII gene contained within theintergenic region of recombinant phage MB46 was carried out using theMuta-Gene M13 In Vitro Mutagenesis Kit (Bio-Rad, Richmond, Calif.). MB46phage (7.5·10⁶ pfu) were used to infect a 50 ml culture of CJ236 cells(O.D.600=0.5). Following overnight incubation at 37° C., phage wererecovered and uracil-containing single-stranded DNA was extracted fromthe phage. The single-stranded DNA was further purified by NACSchromatography as recommended by the manufacturer (B.R.L., Gaithersburg,Md.).

Two hundred nanograms of the purified single-stranded DNA were annealedto 3 picomoles of the phosphorylated 25mer mutagenic oligonucleotide(P1). Following filling in with T4 DNA polymerase and ligation with T4DNA ligase, the sample was used to transfect competent cells which weresubsequently plated on LB plates to permit the formation of plaques.Phage derived from picked plaques were applied to a Nytran membraneusing a Schleicher and Schuell (Keene, N.H.) minifold I apparatus (DotBlot Procedure). Phage DNA was immobilized onto the filter by baking at80° C. for 2 hours. The filter was bathed in 1×Southernpre-hybridization buffer (5Prime-3Prime, West Chester, Pa.) for 2 hours.Subsequently, the filter was incubated in 1×Southern hybridizationsolution (5Prime-3Prime) containing a 21mer probing oligonucleotide(LEU1) which had been radioactively labelled with gamma-³² P-ATP(N.E.N./DuPont, Boston, Mass.) by T4 polynucleotide kinase (New EnglandBioLabs (NEB), Beverly, Mass.). Following overnight hybridization, thefilter was washed 3 times with 6×SSC at room temperature and once at 60°C. in 6×SSC prior to autoradiography. Clones exhibiting stronghybridization signals were chosen for large scale Rf preparation usingthe PZ523 spin column protocol (5Prime-3Prime). Restriction enzymeanalysis confirmed that the structure of the Rf was correct and DNAsequencing confirmed the substitution of a LEU codon (TTG) for the LYS₁₅codon (AAA). This Rf DNA was designated MB46(K15L).

3) Generation of the BPTI-III MA Vector

The original gene III fusion phage MK can be detected on the basis ofits ability to transduce cells to kanamycin resistance (Km^(R)). It wasdeemed advantageous to generate a second gene III fusion vector whichcan confer resistance to a different antibiotic, namely ampicillin (Ap).One could then mix a fusion phage conferring Ap^(R) while displayingengineered protease inhibitor A (EPI-A) with a second fusion phageconferring Km^(R) while displaying EPI-B. The mixture could be added toan immobilized serine protease and, following elution of bound fusionphage, one could evaluate the relative affinity of the two EPIs for theimmobilized protease from the relative abundance of phage that transducecells to Km^(R) or Ap^(R).

The gene is contained in the vector pGem3Zf (Promega Corp., Madison,Wisc.) which can be packaged as single stranded DNA contained inbacteriophage when helper phage are added to bacteria containing thisvector. The recognition sites for restriction enzymes SmaI and SnaBIwere engineered into the 3' non-coding region of the Ap^(R)(β-lactamase) gene using the technique of synthetic oligonucleotidedirected site specific mutagenesis. The single stranded DNA was used asthe template for in vitro mutagenesis leading to the following DNAsequence alterations (numbering as supplied by Promega): a) to create aSmaI (or XmaI) site, bases T₁₁₁₅ →C and A₁₁₁₆ →C, and b) to create aSnaBI site, G₁₁₂₅ →T, C₁₁₂₉ →T, and T₁₁₃₀ →A. The alterations wereconfirmed by radiolabelled probe analysis with the mutatingoligonucleotide and restriction enzyme analysis; this plasmid is namedpSGK3.

Plasmid SGK3 was cut with AatII and SmaI and treated with T4 DNApolymerase (NEB) to remove overhanging 3' ends (MANI82, SAMB89).Phosphorylated HindIII linkers (NEB) were ligated to the blunt ends ofthe DNA and following HindIII digestion, the 1.1 kb fragment wasisolated by agarose gel electrophoresis followed by purification on anUltrafree-MC filter unit as recommended by the manufacturer (Millipore,Bedford, Mass.). M13-MB1/2-delta Rf DNA was cut with HindIII and thelinearized Rf was purified and ligated to the 1.1 kb fragment derivedfrom pSGK3. Ligation samples were used to transfect competent cellswhich were plated on LB plates containing Ap. Colonies were picked andgrown in LB broth containing Ap overnight at 37° C. Aliquots of theculture supernatants were assayed for the presence of infectious phage.Rf DNA was prepared from cultures which were both Ap^(R) and containedinfectious phage. Restriction enzyme analysis confirmed that the Rfcontained a single copy of the Ap^(R) gene inserted into the intergenicregion of the M13 genome in the same transcriptional orientation as thephage genes. This Rf DNA was designated MA.

The 5.9 kb BglII/BsmI fragment from MA Rf DNA and the 2.2 kb BglII/BsmIfragment from BPTI-III MK Rf DNA were ligated together and a portion ofthe ligation mixture was used to transfect competent cells which weresubsequently plated to permit plaque formation on a lawn of cells. Largeand small size plaques were observed on the plates. Small size plaqueswere picked for further analysis since BPTI-III fusion phage give riseto small plaques due to impairment of gene III protein function. Smallplaques were added to LB broth containing Ap and cultures were incubatedovernight at 37° C. An Ap^(R) culture which contained phage which gaverise to small plaques when plated on a lawn of cells was used as asource of Rf DNA. Restriction enzyme analysis confirmed that theBPTI-III fusion gene had been inserted into the MA vector. This Rf wasdesignated BPTI-III MA.

4) Construction of BPTI(K15L)-III MA

MB46(K15L) Rf DNA was digested with XhoI and EagI and the 125 bp DNAfragment was isolated by electrophoresis on a 2% agarose gel followed byextraction from an agarose slice by centrifugation through anUltrafree-MC filter unit. The 8.0 kb XhoI/EagI fragment derived fromBPTI-III MA Rf was also prepared. The above two fragments were ligatedand the ligation sample was used to transfect competent cells which wereplated on LB plates containing Ap. Colonies were picked and used toinoculate LB broth containing Ap. Cultures were incubated overnight at37° C. and phage within the culture supernatants was probed using theDot Blot Procedure. Filters were hybridized to a radioactively labelledoligonucleotide (LEU1). Positive clones were identified byautoradiography after washing filters under high stringency conditions.Rf DNA was prepared from Ap^(R) cultures which contained phage carryingthe K15L mutation. Restriction enzyme analysis and DNA sequencingconfirmed that the K15L mutation had been introduced into the BPTI-IIIMA Rf. This Rf was designated BPTI(K15L)-III MA. Interestingly,BPTI(K15L)-III MA phage gave rise to extremely small plaques on a lawnof cells and the infectivity of the phage is 4 to 5 fold less than thatof BPTI-III MK phage. This suggests that the substitution of LEU forLYS15 impairs the ability of the BPTI:gene III fusion protein to mediatephage infection of bacterial cells.

5) Preparation of Immobilized Human Neutrophil Elastase

One ml of Reacti-Gel 6 ×CDI activated agarose (Pierce Chemical Co.) inacetone (200 μl packed beads) was introduced into an empty Select-D spincolumn (5Prime-3Prime). The acetone was drained out and the beads werewashed twice rapidly with 1.0 ml of ice cold water and 1.0 ml of icecold 100 mM boric acid, pH 8.5, 0.9% NaCl. Two hundred μl of 2.0 mg/mlhuman neutrophil elastase (HNE) (CalBiochem, San Diego, Calif.) inborate buffer were added to the beads. The column was sealed and mixedend over end on a Labquake Shaker at 4° C. for 36 hours. The HNEsolution was drained off and the beads were washed with ice cold 2.0 MTris, pH 8.0 over a 2 hour period at 4° C. to block remaining reactivegroups. A 50% slurry of the beads in TBS/BSA was prepared. To this wasadded an equal volume of sterile 100% glycerol and the beads were storedas a 25% slurry at -20° C. Prior to use, the beads were washed 3 timeswith TBS/BSA and a 50% slurry in TBS/BSA was prepared.

6) Characterization of the Affinity of BPTI-III MK and BPTI(K15L)-III MAPhage for Immobilized Trypsin and Human Neutrophil Elastase

Thirty μ1 of BPTI-III MK phage in TBS/BSA (1.7·10¹¹ pfu/ml) was added to5 μl of a 50% slurry of either immobilized human neutrophil elastase orimmobilized trypsin (Pierce Chemical Co.) also in TBS/BSA. Similarly 30μl of BPTI(K15L)-III MA phage in TBS/BSA (3.2·10¹⁰ pfu/ml) was added toeither immobilized HNE or trypsin. Samples were mixed on a Labquakeshaker for 3 hours. The beads were washed with 0.5 ml of TBS/BSA for 5minutes and recovered by centrifugation. The supernatant was removed andthe beads were washed 5 times with 0.5 ml of TBS/0.1% Tween-20. Finally,the beads were resuspended in 0.5 ml of elution buffer (0.1 M HClcontaining 1.0 mg/ml BSA adjusted to pH 2.2 with glycine), mixed for 5minutes and recovered by centrifugation. The supernatant fraction wasremoved, neutralized with 130 μl of M Tris, pH 8.0, diluted in LB broth,and titered for plaque-forming units on a lawn of cells.

Table 202 illustrates that 82 times more of the BPTI-III MK input phagebound to the trypsin beads than to the HNE beads. By contrast, theBPTI(K15L)-III MA phage bound preferentially to HNE beads by a factor of36. These results are consistent with the known affinities of wild typeand the K15L variant of BPTI for trypsin and HNE. Hence BPTI-III fusionphage bind selectively to immobilized proteases and the nature of theBPTI variant displayed on the surface of the fusion phage dictates whichparticular protease is the optimum receptor for the fusion phage.

7) Effect of pH on the Dissociation of Bound BPTI-III MK andBPTI(K15L)-III MA Phage from Immobilized Neutrophil Elastase

The affinity of a given fusion phage for an immobilized serine proteasecan be characterized on the basis of the amount of bound fusion phagewhich elutes from the beads by washing with a pH 2.2 buffer. Thisrepresents rather extreme conditions for the dissociation of fusionphage from beads. Since the affinity of the BPTI variants describedabove for HNE is not high (K_(d) >1·10⁻⁹ M) it was anticipated thatfusion phage displaying these variants might dissociate from HNE beadsunder less severe pH conditions. Furthermore fusion phage mightdissociate from HNE beads under specific pH conditions characteristic ofthe particular BPTI variant displayed by the phage. Low pH buffersproviding stringent wash conditions might be required to dissociatefusion phage displaying a BPTI variant with a high affinity for HNEwhereas neutral pH conditions might be sufficient to dislodge a fusionphage displaying a BPTI variant with a weak affinity for HNE.

Thirty μl of BPTI(K15L)-III MA phage (1.7·10¹⁰ pfu/ml in TBS/BSA) wereadded to 5 μl of a 50% slurry of immobilized HNE also in TBS/BSA.Similarly, 30 μl of BPTI-III MA phage (8.6·10¹⁰ pfu/ml in TBS/BSA) wereadded to 5 μl of immobilized HNE. The above conditions were chosen toensure that an approximately equivalent number of phage particles wereadded to the beads. The samples were incubated for 3 hours on a Labquakeshaker. The beads were washed with 0.5 ml of TBS/BSA for 5 min on theshaker, recovered by centrifugation and the supernatant was removed. Thebeads were washed with 0.5 ml of TBS/0.1% Tween-20 for 5 minutes andrecovered by centrifugation. Four additional washes with TBS/0.1%Tween-20 were performed as described above. The beads were washed asabove with 0.5 ml of 100 mM sodium citrate, pH 7.0 containing 1.0 mg/mlBSA. The beads were recovered by centrifugation and the supernatant wasremoved. Subsequently, the HNE beads were washed sequentially with aseries of 100 mM sodium citrate, 1.0 mg/ml BSA buffers of pH 6.0, 5.0,4.0 and 3.0 and finally with the 2.2 elution buffer described above. ThepH washes were neutralized by the addition of 1 M Tris, pH 8.0, dilutedin LB broth and titered for plaque-forming units on a lawn of cells.

Table 203 illustrates that a low percentage of the input BPTI-III MKfusion phage adhered to the HNE beads and was recovered in the pH 7.0and 6.0 washes predominantly. By contrast, a significantly higherpercentage of the BPTI(K15L)-III MA phage bound to the HNE beads and wasrecovered predominantly in the pH 5.0 and 4.0 washes. Hence lower pHconditions (i.e. more stringent) are required to dissociateBPTI(K15L)-III MA than BPTI-MK phage from immobilized HNE. The affinityof BPTI(K15L) is over 1000 times greater than that of BPTI for HNE(based on reported K_(d) values (BECK88b)). Hence this suggests thatlower pH conditions are indeed required to dissociate fusion phagedisplaying a BPTI variant with a higher affinity for HNE.

8) Construction of BPTI(MGNG)-III MA Phage

The light chain of bovine inter-α-trypsin inhibitor contains 2 domainshighly homologous to BPTI. The amino terminal proximal domain (calledBI-8e) has been generated by proteolysis and shown to be a potentinhibitor of HNE (K_(d) =4.4·10⁻¹¹ M) (ALBR83). By contrast a BPTIvariant with the single substitution of LEU for LYS₁₅ exhibits amoderate affinity for HNE (K_(d) =2.9·10⁻⁹ M) (BECK88b). It has beenproposed that the P1 residue is the primary determinant of thespecificity and potency of BPTI-like molecules (BECK88b, LASK80 andworks cited therein). Although both BI-8e and BPTI(K15L) feature LEU attheir respective P1 positions, there is a 66 fold difference in theaffinities of these molecules for HNE. Structural features, other thanthe P1 residue, must contribute to the affinity of BPTI-like moleculesfor HNE.

A comparison of the structures of BI-8e and BPTI-(K15L) reveals thepresence of three positively charged residues at positions 39, 41, and42 of BPTI which are absent in BI-8e. These hydrophilic and highlycharged residues of BPTI are displayed on a loop which underlies theloop containing the P1 residue and is connected to it via a disulfidebridge. Residues within the underlying loop (in particular residue 39)participate in the interaction of BPTI with the surface of trypsin nearthe catalytic pocket (BLOW72) and may contribute significantly to thetenacious binding of BPTI to trypsin. However, these hydrophilicresidues might hamper the docking of BPTI variants with HNE. In supportof this hypothesis, BI-8e displays a high affinity for HNE and containsno charged residues in the region spanning residues 39-42. Henceresidues 39 through 42 of wild type BPTI were replaced with thecorresponding residues of the human homologue of BI-8e. We anticipatedthat a BPTI derivative containing the MET-GLY-ASN-GLY (MGNG) sequence(SEQ ID NO: 12) would exhibit a higher affinity for HNE thancorresponding derivatives which retain the sequence of wild type BPTI atresidues 39-42.

A double stranded oligonucleotide with AccI and EagI compatible ends wasdesigned to introduce the desired alteration of residues 39 to 42 viacassette mutagenesis. Codon 45 was altered to create a new XmnI site,unique in the structure of the BPTI gene, which could be used to screenfor mutants. This alteration at codon 45 does not alter the encodedamino-acid sequence. BPTI-III MA Rf DNA was digested with AccI. Twooligonucleotides (CYSB and CYST) corresponding to the bottom and topstrands of the mutagenic DNA were annealed and ligated to the AccIdigested BPTI-III MA Rf DNA. The sample was digested with BglII and the2.1 kb BglII/EagI fragment was purified. BPTI-III MA Rf was alsodigested with BglII and EagI and the 6.0 kb fragment was isolated andligated to the 2.1 kb BglII/EagI fragment described above. Ligationsamples were used to transfect competent cells which were plated topermit the formation of plaques on a lawn of cells. Phage derived fromplaques were probed with a radioactively labelled oligonucleotide (CYSB)using the Dot Blot Procedure. Positive clones were identified byautoradiography of the Nytran membrane after washing at high stringencyconditions. Rf DNA was prepared from Ap^(R) cultures containing fusionphage which hybridized to the CYSB probe. Restriction enzyme analysisand DNA sequencing confirmed that codons 39-42 of BPTI had been altered.The Rf DNA was designated BPTI(MGNG)-III MA.

9) Construction of BPTI(K15L,MGNG)-III MA

BPTI(MGNG)-III MA Rf DNA was digested with AccI and the 5.6 kb fragmentwas purified. BPTI(K15L)-III MA was digested with AccI and the 2.5 kbDNA fragment was purified. The two fragments above were ligated togetherand ligation samples were used to transfect competent cells which wereplated for plaque production. Large and small plaques were observed onthe plate. Representative plaques of each type were picked and phagewere probed with the LEU1 oligonucleotide via the Dot Blot Procedure.After the Nytran filter had been washed under high stringencyconditions, positive clones were identified by autoradiography. Only thephage which hybridized to the LEU1 oligonucleotide gave rise to thesmall plaques confirming an earlier observation that substitution of LEUfor LYS₁₅ substantially reduces phage infectivity. Appropriate culturescontaining phage which hybridized to the LEU1 oligonucleotide were usedto prepare Rf DNA. Restriction enzyme analysis and DNA sequencingconfirmed that the K15L mutation had been introduced intoBPTI-(MGNG)-III MA. This Rf DNA was designated BPTI(K15L,-MGNG)-III MA.

10) Effect of Mutation of Residues 39-42 of BPTI(K15L) on its Affinityfor Immobilized HNE

Thirty μl of BPTI(K15L,MGNG)-III MA phage (9.2·10⁹ pfu/ml in TBS/BSA)were added to 5 μl of a 50% slurry of immobilized HNE also in TBS/BSA.Similarly 30 μl of BPTI(K15L)-III MA phage (1.2·10¹⁰ pfu/ml in TBS/BSA)were added to immobilized HNE. The samples were incubated for 3 hours ona Labquake shaker. The beads were washed for 5 min with 0.5 ml ofTBS/BSA and recovered by centrifugation. The beads were washed 5 timeswith 0.5 ml of TBS/0.1% Tween-20 as described above. Finally, the beadswere washed sequentially with a series of 100 mM sodium citrate buffersof pH 7.0, 6.0, 5.5, 5.0, 4.75, 4.5, 4.25, 4.0 and 3.5 as describedabove. pH washes were neutralized, diluted in LB broth and titered forplaque-forming units on a lawn of cells.

Table 204 illustrates that almost twice as much of theBPTI(K15L,MGNG)-III MA as BPTI(K15L)-III MA phage bound to HNE beads. Inboth cases the pH 4.75 fraction contained the largest proportion of therecovered phage. This confirms that replacement of residues 39-42 ofwild type BPTI with the corresponding residues of BI-8e enhances thebinding of the BPTI(K15L) variant to HNE.

11) Fractionation of a Mixture of BPTI-III MK and BPTI(K15L,MGNG)-III MAFusion Phage

The observations described above indicate that BPTI(K15L,MGNG)-III MAand BPTI-III MK phage exhibit different pH elution profiles fromimmobilized HNE. It seemed plausible that this property could beexploited to fractionate a mixture of different fusion phage.

Fifteen μl of BPTI-III MK phage (3.92·10¹⁰ pfu/ml in TBS/BSA),equivalent to 8.91·10⁷ Km^(R) transducing units, were added to 15 μl ofBPTI(K15L,MGNG)-III MA phage (9.85·10⁹ pfu/ml in TBS/BSA), equivalent to4.44·10⁷ Ap^(R) transducing units. Five μl of a 50% slurry ofimmobilized HNE in TBS/BSA was added to the phage and the sample wasincubated for 3 hours on a Labquake mixer. The beads were washed for 5minutes with 0.5 ml of TBS/BSA prior to being washed 5 times with 0.5 mlof TBS/2.0% Tween-20 as described above. Beads were washed for 5 minuteswith 0.5 ml of 100 mM sodium citrate, pH 7.0 containing 1.0 mg/ml BSA.The beads were recovered by centrifugation and the supernatant wasremoved. Subsequently, the HNE beads were washed sequentially with aseries of 100 mM citrate buffers of pH 6.0, 5.0 and 4.0. The pH washeswere neutralized by the addition of 130 μl of 1 M Tris, pH 8.0.

The relative proportion of BPTI-III MK and BPTI(K15L-,MGNG)-III MA phagein each pH fraction was evaluated by determining the number of phageable to transduce cells to Km^(R) as opposed to Ap^(R). Fusion phagediluted in 1×Minimal A salts were added to 100 μl of cells (O.D.600=0.8concentrated to 1/20 original culture volume) also in Minimal salts in afinal volume of 200 μl. The sample was incubated for 15 min at 37° C.prior to the addition of 200 μl of 2×LB broth. After an additional 15min incubation at 37° C., duplicate aliquots of cells were plated on LBplates containing either Ap or Km to permit the formation of colonies.Bacterial colonies on each type of plate were counted and the data wasused to calculate the number of Ap^(R) and Ka^(R) transducing units ineach pH fraction. The number of Ap^(R) transducing units is indicativeof the amount of BPTI(K15L,MGNG)-III MA phage in each pH fraction whilethe total number of Km^(R) transducing units is indicative of the amountof BPTI-III MK phage.

Table 205 illustrates that a low percentage of the BPTI-III MK inputphage (as judged by Km^(R) transducing units) adhered to the HNE beadsand was recovered predominantly in the pH 7.0 fraction. By contrast, asignificantly higher percentage of the BPTI(K15L,MGNG)-III MA phage (asjudged by Ap^(R) transducing units) adhered to the HNE beads and wasrecovered predominantly in the pH 4.0 fraction. A comparison of thetotal number of Ap^(R) and Km^(R) transducing units in the pH 4.0fraction shows that a 984-fold enrichment of BPTI(K15L,MGNG)-III MAphage over BPTI-III MK phage was achieved. Hence, the above procedurecan be utilized to fractionate mixtures of fusion phage on the basis oftheir relative affinities for immobilized HNE.

12) Construction of BPTI(K15V,R17L)-III MA

A BPTI variant containing the alterations K15V and R17L demonstrates thehighest affinity for HNE of any BPTI variant described to date (K_(d)=6·10⁻¹¹ M) (AUER89). As a means of testing the selection systemdescribed herein, a fusion phage displaying this variant of BPTI wasgenerated and used as a "reference" phage to characterize the affinityfor immobilized HNE of fusion phage displaying a BPTI variant with aknown affinity for free HNE. A 76 bp mutagenic oligonucleotide (VAL1)was designed to convert the LYS₁₅ codon (AAA) to a VAL codon (GTT) andthe ARG₁₇ codon (CGA) to a LEU codon (CTG). At the same time codons 11,12 and 13 were altered to destroy the ApaI site resident in the wildtype BPTI gene while creating a new RsrII site, which could be used toscreen for correct clones.

The single stranded VAL1 oligonucleotide was converted to the doublestranded form following the procedure described in Current Protocols inMolecular Biology (AUSU87). One μg of the VAL1 oligonucleotide wasannealed to one μg of a 20 bp primer (MB8). The sample was heated to 80°C., cooled to 62° C. and incubated at this temperature for 30 minutesbefore being allowed to cool to 37° C. Two μl of a 2.5 mM mixture ofdNTPs and 10 units of Sequenase (U.S.B., Cleveland, Ohio) were added tothe sample and second strand synthesis was allowed to proceed for 45minutes at 37° C. One hundred units of XhoI was added to the sample anddigestion was allowed to proceed for 2 hours at 37° C. in 100 μl of1×XhoI digestion buffer. The digested DNA was subjected toelectrophoreses on a 4% GTG NuSieve agarose (FMC Bioproducts, Rockland,Me.) gel and the 65 bp fragment was excised and purified from meltedagarose by phenol extraction and ethanol precipitation. A portion of therecovered 65 bp fragment was subjected to electrophoresis on a 4% GTGNuSieve agarose gel for quantitation. One hundred nanograms of therecovered fragment was dephosphorylated with 1.9 μl of HK(™) phosphatase(Epicentre Technologies, Madison, Wis.) at 37° C. for 60 minutes. Thereaction was stopped by heating at 65° C. for 15 minutes. BPTI-MA Rf DNAwas digested with XhoI and StuI and the 8.0 kb fragment was isolated.One μl of the dephosphorylation reaction (5 ng of double-stranded VAL1oligonucleotide) was ligated to 50 ng of the 8.0 kb XhoI/StuI fragmentderived from BPTI-III MA Rf. Ligation samples were subjected to phenolextraction and DNA was recovered by ethanol precipitation. Portions ofthe recovered ligation DNA were added to 40 μl of electrocompetent cellswhich were shocked using a Bio-Rad Gene Pulser device set at 1.7 kv, 25μF and 800 Ω. One ml of SOC media was immediately added to the cellswhich were allowed to recover at 37° C. for one hour. Aliquots of theelectroporated cells were plated onto LB plates containing Ap to permitthe formation of colonies.

Phage contained within cultures derived from picked Ap^(R) colonies wereprobed with two radiolabelled oligonucleotides (PRP1 and ESP1) via theDot Blot Procedure. Rf DNA was prepared from cultures containing phagewhich exhibited a strong hybridization signal with the ESP1oligonucleotide but not with the PRP1 oligonucleotide. Restrictionenzyme analysis verified loss of the ApaI site and acquisition of a newRsrII site diagnostic for the changes in the P 1 region. Fusion phagewere also probed with a radiolabelled oligonucleotide (VLP1) via the DotBlot Procedure. Autoradiography confirmed that fusion phage whichpreviously failed to hybridize to the PRP1 probe, hybridized to the VLP1probe. DNA sequencing confirmed that the LYS₁₅ and ARG₁₇ codons had beenconverted to VAL and LEU codons respectively. The Rf DNA was designatedBPTI(K15V,R17L)-III MA.

13) Affinity of BPTI(K15V,R17L)-III MA Phage for Immobilized HNE

Forty μl of BPTI(K15,R17L)-III MA phage (9.8·10¹⁰ pfu/ml) in TBS/BSAwere added to 10 μl of a 50% slurry of immobilized HNE also in TBS/BSA.Similarly, 40 μl of BPTI(K15L,MGNG)-III MA phage (5.13·10⁹ pfu/ml) inTBS/BSA were added to immobilized HNE. The samples were mixed for 1.5hours on a Labquake shaker. Beads were washed once for 5 min with 0.5 mlof TBS/BSA and then 5 times with 0.5 ml of TBS/1.0% Tween-20 asdescribed previously. Subsequently the beads were washed sequentiallywith a series of 50 mM sodium citrate buffers containing 150 mM NaCl,1.0 mg/ml BSA of pH 7.0, 6.0, 5.0, 4.5, 4.0, 3.75, 3.5 and 3.0. In thecase of the BPTI(K15L,MGNG)-III MA phage, the pH 3.75 and 3.0 washeswere omitted. Two washes were performed at each pH and the supernatantswere pooled, neutralized with 1 M Tris pH 8.0, diluted in LB broth andtitered for plaque-forming units on a lawn of cells.

Table 206 illustrates that the pH 4.5 and 4.0 fractions contained thelargest proportion of the recovered BPTI(K15V,R17L)-III MA phage. Bycontrast, the BPTI(K15L,MGNG)-III MA phage, like BPTI(K15L)-III MAphage, were recovered predominantly in the pH 5.0 and 4.5 fractions, asshown above. The affinity of BPTI(K15V,-R17L) is 48 times greater thanthat of BPTI(K15L) for HNE (based on reported Kd values, AUER89 forBPTI(K15V,R17L) and BECK88b for BPTI(K15L)). That the pH elution profilefor BPTI(K15V,R17L)-III MA phage exhibits a peak at pH 4.0 while theprofile for BPTI(K15L)-III MA phage displays a peak at pH 4.5 supportsthe contention that lower pH conditions are required to dissociate, fromimmobilized HNE, fusion phage displaying a BPTI variant with a higheraffinity for free HNE.

EXAMPLE IV CONSTRUCTION OF A VARIEGATED POPULATION OF PHAGE DISPLAYINGBPTI DERIVATES AND FRACTIONATION FOR MEMBERS THAT DISPLAY BINDINGDOMAINS HAVING HIGH AFFINITY FOR HUMAN NEUTROPHIL ELASTASE

We here describe generation of a library of 1000 different potentialengineered protease inhibitiors (PEPIs) and the fractionation withimmobilized HNE to obtain an engineered protease inhibitor (Epi) havinghigh affinity for HNE. Successful Epis that bind HNE are designatedEpiNEs.

1) Design of a Mutagenic Oligonucleotide to Create a Library of FusionPhase

A 76 bp variegated oligonucleotide (MYMUT) was designed to construct alibrary of fusion phage displaying 1000 different PEPIs derived fromBPTI. The oligonucleotide contains 1728 different DNA sequences but dueto the degeneracy of the genetic code, it encodes 1000 different proteinsequences. The oligonucleotide was designed so as to destroy an ApaIsite (shown in Table 113) encompassing codons 12 and 13. ApaI digestioncould be used to select against the parental Rf DNA used to constructthe library.

The MYMUT oligonucleotide permits the substitution of 5 hydrophobicresidues (PHE, LEU, ILE, VAL, and MET via a DTS codon (D=approximatelyequimolar A, T, and G; S=approximately equimolar C and G)) for LYS15.Replacement of LYS₁₅ in BPTI with aliphatic hydrophobic residues viasemi-synthesis has provided proteins having higher affinity for HNE thanBPTI (TANK77, JERI74a,b, WENZ80, TSCH86, BECK88b). At position 16,either GLY or ALA are permitted (GST codon). This is in keeping with thepredominance of these two residues at the corresponding positions in avariety of BPTI homologues (CREI87). The variegation scheme at position17 is identical to that at 15. Limited data is available on the relativecontribution of this residue to the interaction of BPTI homologues withHNE. A variety of hydrophobic residues at position 17 was included withthe anticipation that they would enhance the docking of a BPTI variantwith HNE. Finally at positions 18 and 19, 4 (PHE, SER, THR, and ILE viaa WYC codon (W approximately equimolar A and T; Y=approximatelyequimolar T and C)) and 5 (SER, PRO, THR, LYS, GLN, and stop via an HMAcodon (H=approximately equimolar A, C, and T; M=approximately equimolarA and C)) different amino acids respectively are encoded. Thesedifferent amino acid residues are found in the corresponding positionsof BPTI homologues that are known to bind to HNE (CREI87). Although theamino acids included in the PEPI library were chosen because there wassome indication that they might facilitate binding to HNE, it was notand is not possible to predict which combination of these amino acidswill lead to high affinity for HNE. The mutagenic oligonucleotide MYMUTwas synthesized by Genetic Design Inc. (Houston, Tx.).

2) Construction of Library of Fusion Phase Displaying PotentialEngineered Protease Inhibitors

The single-stranded mutagenic MYMUT DNA was converted to the doublestranded form with compatible XhoI and StuI ends and dephosphorylatedwith HK(™) phosphatase as described above for the VALl oligonucleotide.BPTI(MGNG)-III MA Rf DNA was digested with XhoI and StuI for 3 hours at37° C. to ensure complete digestion. The 8.0 kb DNA fragment waspurified by agarose gel electrophoresis and Ultrafree-MC unitfiltration. One μl of the dephosphorylated MYMUT DNA (5 ng) was ligatedto 50 ng of the 8.0 kb fragment derived from BPTI(MGNG)-III MA Rf DNA.Under these conditions, the 10:1 molar ratio of insert to vector wasfound to be optimal for the generation of transformants. Ligationsamples were extracted with phenol, phenol/chloroform/IAA (25:24:1,v:v:v) and chloroform/IAA (24:1, v:v) and DNA was ethanol precipitatedprior to electroporation. One μl of the recovered ligation DNA was addedto 40 μl of electro-competent cells. Cells were shocked using a Bio-RadGene Pulser device as described above. Immediately followingelectroshock, 1.0 ml of SOC media was added to the cells which wereallowed to recover at 37° C. for 60 minutes with shaking. Theelectroporated cells were plated onto LB plates containing Ap to permitthe formation of colonies.

To assess the efficiency of the cassette mutagenesis procedure, 39transformants were picked at random and phage present in culturesupernatants were applied to a Nytran membrane and probed using the DotBlot Procedure. Two Nytran membranes were prepared in this manner. Thefirst filter was allowed to hybridize to the CYSB oligonucleotide whichhad previously been radiolabelled. The second membrane was allowed tohybridize to the PRP1 oligonucleotide which had also been radiolabelled.Filters were subjected to autoradiography following washing under highstringency conditions. Of the 39 phage samples applied to the membrane,all 39 hybridized to the CYSB probe. This indicated that there wasfusion phage in the culture supernatants and that at least the DNAencoding residues 35-47 appeared to be present in the phage genomes.Only 11 of the 39 samples hybridized to the PRP1 oligonucleotideindicating that 28% of the transformants were probably the parentalphage BPTI(MGNG)-III MA used to generate the library. The remaining 28clones failed to hybridize to the PRP1 probe indicating that substantialalterations were introduced into the P1 region by cassette mutagenesisusing the MYMUT oligonucleotide. Of these 28 samples, all were found tocontain infectious phage indicating that mutagenesis did not result inframe shift mutations which would lead to the generation of defectivegene III products and non-infectious phage. (These 28 PEPI-displayingphage constitute a mini-library, the fractionation of which is discussedbelow.) Hence the overall efficiency of mutagenesis was estimated to be72% in those cases where ligation DNA was not subjected to ApaIdigestion prior to electroporation.

Bacterial colonies were harvested by overlaying chilled LB platescontaining Ap with 5 ml of ice cold LB broth and scraping off cellsusing a sterile glass rod. A total of 4899 transformants were harvestedin this manner of which 3299 were obtained by electroporation ofligation samples which were not digested with ApaI. Hence we estimatethat 72% of these transformants (i.e. 2375) represent mutants of theparental BPTI(MGNG)-III MA phage derived by cassette mutagenesis of theP1 position. An additional 1600 transformants were obtained byelectroporation of ligation samples which had been digested with ApaI.If we assume that all of these clones contain new sequences at the P1position then the total number of mutants in the pool of 4899transformants is estimated to be 2375+1600 =3975. The total number ofpotentially different DNA sequences in the MYMUT library is 1728. Wecalculate that the library should display about 90% of the potentialengineered protease inhibitor sequences as follows: ##EQU3##

3) Fractionation of a Mini-Library of Fusion Phage

We studied the fractionation of the mini library of 28 PEPIs toestablish the appropriate parameters for fractionation of the entireMYMUT PEPI library. We anticipated that fractionation could be easierwhen the library of fusion phage was much less diverse than the entireMYMUT library. Fewer cycles of fractionation might be required toaffinity purify a fusion phage exhibiting a high affinity for HNE.Secondly, since the sequences of all the fusion phage in themini-library can be determined, one can determine the probability ofselecting a given fusion phage from the initial population.

Two ml of the culture supernatants of the 28 PEPIs described above werepooled. Fusion phage were recovered, resuspended in 300 mM NaCl, 100 mMTris, pH 8.0, 1 mM EDTA and stored on ice for 15 minutes. Insolublematerial was removed by centrifugation for 3 minutes in a microfuge at4° C. The supernatant fraction was collected and PEPI phage wereprecipitated with PEG-8000. The final phage pellet was resuspended inTBS/BSA. Aliquots of the recovered phage were titered for plaque-formingunits on a lawn of cells. The final stock solution consisted of 200 μlof fusion phage at a concentration of 5.6·10¹² pfu/ml.

a) First Enrichment Cycle

Forty μl of the above phage stock was added to 10 μl of a 50% slurry ofHNE beads in TBS/BSA. The sample was allowed to mix on a Labquake shakerfor 1.5 hours. Five hundred μl of TBS/BSA was added to the sample andafter an additional 5 minutes of mixing, the HNE beads were collected bycentrifugation. The supernatant fraction was removed and the beads wereresuspended in 0.5 ml of TBS/0.5% Tween-20. Beads were washed for 5minutes on the shaker and recovered by centrifugation as above. Thesupernatant fraction was removed and the beads were subjected to 4additional washes with TBS/Tween-20 as described above to reducenon-specific binding of fusion phage to HNE beads. Beads were washedtwice as above with 0.5 ml of 50 mM sodium citrate pH 7.0, 150 mM NaClcontaining 1.0 mg/ml BSA. The supernatants from the two washes werepooled. Subsequently, the HNE beads were washed sequentially with aseries of 50 mM sodium citrate, 150 mM NaCl, 1.0 mg/ml BSA buffers of pH6.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5 and 2.0. Two washes were performed ateach pH and the supernatants were pooled and neutralized by the additionof 260 μl of 1 M Tris, pH 8.0. Aliquots of each pH fraction were dilutedin LB broth and titered for plaque-forming units on a lawn of cells. Thetotal amount of fusion phage (as judged by pfu) appearing in each pHwash fraction was determined.

FIG. 7 illustrates that the largest percentage of input phage whichbound to the HNE beads was recovered in the pH 5.0 fraction. The elutionpeak exhibits a trailing edge on the low pH side suggesting that a smallproportion of the total bound fusion phage might elute from the HNEbeads at a pH <5. BPTI(K15L)-III phage display a BPTI variant with amoderate affinity for HNE (K_(d) =2.9·10⁻⁹ M) (BECK88b). SinceBPTI(K15L)-III phage elute from HNE beads as a peak centered on pH 4.75and the highest peak in the first passage of the mini-library over HNEbeads is centered on pH 5.0, we infer that many members of the MYMUTPEPI mini-library display PEPIs having moderate to high affinity forHNE.

To enrich for fusion phage displaying the highest affinity for HNE,phage contained in the lowest pH fraction (pH 2.0) from the firstenrichment cycle were amplified and subjected to a second round offractionation. Amplification involved the Transduction Proceduredescribed above. Fusion phage (2000 pfu) were incubated with 100 μl ofcells for 15 minutes at 37° C. in 200 μl of 1 X Minimal A salts. Twohundred μl of 2 X LB broth was added to the sample and cells wereallowed to recover for 15 minutes at 37° C. with shaking. One hundred μlportions of the above sample were plated onto LB plates containing Ap.Five such transduction reactions were performed yielding a total of 20plates, each containing approximately 350 colonies (7000 transformantsin total). Bacterial cells were harvested as described for thepreparation of the MYMUT library and fusion phage were collected asdescribed for the preparation of the mini-library. A total of 200 μl offusion phage (4.3·10¹² pfu/ml in TBS/BSA) derived from the pH 2.0fraction from the first passage of the mini-library was obtained in thismanner.

b) Second Enrichment Cycle

Forty μl of the above phage stock was added to 10 μl of a 50% slurry ofHNE beads in TBS/BSA. The sample was allowed to mix for 1.5 hours andthe HNE beads were washed with TBS/BSA, TBS/0.5% Tween and sodiumcitrate buffers as described above. Aliqouts of neutralized pH fractionswere diluted and titered as described above.

The elution profile for the second passage of the mini-library over HNEbeads is shown in FIG. 7. The largest percentage of the input phagewhich bound to the HNE beads was recovered in the pH 3.5 wash. A smallerpeak centered on pH 4.5 may represent residual fusion phage from thefirst passage of the mini-library which eluted at pH 5.0. The percentageof total input phage which eluted at pH 3.5 in the second cycle exceedsthe percentage of input phage which eluted at pH 5.0 in the first cycle.This is indicative of more avid binding of fusion phage to the HNEmatrix. Taken together, the significant shift in the pH elution profilesuggests that selection for fusion phage displaying BPTI variants withhigher affinity for HNE occurred.

c) Third Cycle

Phage obtained in the pH 2.0 fraction from the second passage of themini-library were amplified as above and subjected to a third round offractionation. The pH elution profile is shown in FIG. 7. The largestpercentage of input phage was recovered in the pH 3.5 wash as is thecase with the second passage of the mini-library. However, the minorpeak centered on pH 4.5 is diminished in the third passage relative tothe second passage. Furthermore, the percentage of input phage whicheluted at pH 3.5 is greater in the third passage than in the secondpassage. In comparison, the BPTI(K15V,R17L)-III fusion phage elute fromHNE beads as a peak centered on pH 4.25. Taken together, the datasuggests that a significant selection for fusion phage displaying PEPIswith high affinity for HNE occurred. Furthermore, since more extreme pHconditions are required to elute fusion phage in the third passage ofthe MYMUT library relative to those conditions needed to eluteBPTI(K15V,R17L)-III MA phage, this suggests that those fusion phagewhich appear in the pH 3.5 fraction may display a PEPI with a higheraffinity for HNE than the BPTI(K15V,R17L) variant (i.e. K_(d) <6·10⁻¹¹M).

Characterization of Selected Fusion Phage

The pH 2.0 fraction from the third passage of the mini-library wastitered and plaques were obtained on a lawn of cells. Twenty plaqueswere picked at random and phage derived from plaques were probed withthe CYSB oligonucleotide via the Dot Blot Procedure. Autoradiography ofthe filter revealed that all 20 samples gave a positive hybridizationsignal indicating that fusion phage were present and the DNA encodingresidues 35 to 47 of BPTI(MGNG) is contained within the recombinant M13genomes. Rf DNA was prepared for the 20 clones and initial dideoxysequencing revealed that 12 clones were identical. This sequence wasdesignated EpiNEαSEQ ID NO:45 (Table 207). No DNA sequence changes wereobserved apart from the planned variegation. Hence the cassettemutagenesis procedure preserved the context of the planned variegationof the pepi gene. The Dot Blot Procedure was employed to probe all 20selected clones from the pH 2.0 fraction from the third passage of themini-library with an oligonucleotide homologous to the sequence ofEpiNEα. Following high stringency washing, autoradiography revealed thatall 20 selected clones were identical in the P1 region. Furthermore dotblot analysis revealed that of the 28 different phage samples pooled tocreate the mini-library, only one contained the EpiNEα sequence. Hencein just three passes of the mini-library over HNE beads, 1 out of 28input fusion phage was selected for and appears as a pure population inthe lowest pH fraction from the third passage of the library. That theEpiNEα phage elute at pH 3.5 while BPTI(K15V,R17L)-III MA phage elute ata higher pH strongly suggests that the EpiNEα protein has asignificantly higher affinity than BPTI(K15V,R17L) for HNE.

4) Fractionation of the MYMUT Library a) Three cycles of enrichment

The same procedure used above to fractionation the mini-library was usedto fractionate the entire MYMUT PEPI library consisting of fusion phagedisplaying 1000 different proteins The phage inputs for the first,second and third rounds of fractionation were 4.0·10¹¹, 5.8·10¹⁰, and1.1·10¹¹ pfu respectively. FIG. 8 illustrates that the largestpercentage of input phage which bound to the HNE matrix was recovered inthe pH 5.0 wash in the first enrichment cycle. The pH elution profile isvery similar to that seen for the first passage of the mini-library overHNE beads. A trailing edge is also observed on the low pH side of the pH5.0 peak however this is not as prominent as that observed for themini-library. The percentage of input phage which eluted in the pH 7.0wash was greater than that eluted in the pH 6.0 wash. This is incontrast to the result obtained for the first passage of the minilibrary and may reflect the presence of ≈20% parental BPTI(MGNG)-III MAphage in the MYMUT library pool. These phage adhere to the HNE beadsweakly (if at all) and elute in the pH 7.0 fraction. That no parentphage were present in the mini-library is consistent with the absence ofa peak at pH 7.0 in the first passage of the mini-library.

Phage present in the pH 2.0 fraction from the first passage of the MYMUTlibrary were amplified as described previously and subjected to a secondround of fractionation. The largest percentage of input phage whichbound to the HNE beads was recovered in the pH 3.5 wash (FIG. 8). Aminor peak centered on pH 4.5 was also evident. The fact that moreextreme pH conditions were required to elute the majority of boundfusion phage suggested that selection of fusion phage displaying PEPIswith higher affinity for HNE had occurred. This was also indicated bythe fact that the total percentage of input phage which appeared in thepH 3.5 wash in the second enrichment cycle was 10 times greater than thepercentage of input which appeared in the pH 5.0 wash in the firstcycle.

Fusion phage from the pH 2.0 fraction of the second pass of the MYMUTlibrary were amplified and subjected to a third passage over HNE beads.The proportion of fusion phage appearing in the pH 3.5 fraction relativeto that in the 4.5 fraction was greater in the third passage than in thesecond passage (FIG. 8). Also the amount of fusion phage appearing inthe pH 3.5 fraction was higher in the third passage than in the secondpassage. The fact that wash conditions less than pH 4.25 were requiredto elute bound fusion phage derived from the MYMUT library suggests thatthe EpiNEs displayed by these phage possess a higher affinity for HNEthan the BPTI(K15V,R17L) variant.

b) Characterization of Selected Clones

The pH 2.0 fraction from the third enrichment cycle of the MYMUT librarywas titered on a lawn of cells. Twenty plaques were picked at random. RfDNA was prepared for each of the clones and fusion phage were collectedby PEG precipitation. Clonally pure populations of fusion phage inTBS/BSA were prepared and characterized with respect to their affinityfor immobilized HNE. pH elution profiles were obtained to determine thestringency of the conditions required to elute bound fusion phage fromthe HNE matrix. FIG. 9 illustrates the pH profiles obtained for EpiNEclone 1 (SEQ ID NO:51), 3, (SEQ ID NO:46) and 7 (SEQ ID NO:48). The pHprofiles for all 3 clones exhibit a peak centered on pH 3.5. Unlike thepH profile obtained for the third passage of the MYMUT library, no minorpeak centered on pH 4.5 is evident. This is consistent with the clonalpurity of the selected EpiNE phage utilized to generate the profiles.The elution peaks are not symmetrical and a prominent trailing edge onthe low pH side. In all probability, the 10 minute elution periodemployed is inadequate to remove bound fusion phage at the low pHconditions. EpiNE clones 1 through 8 have the following characteristics:five clones (identified as EpiNE1, (SEQ ID NO: 1) EpiNE3 (SEQ ID NO:46),EpiNE5 (SEQ ID NO: 52), EpiNE6, (SEQ ID NO: 47) and EpiNE7) (SEQ ID NO:48) display very similar pH profiles centered on pH 3.5. The remaining 3clones elute in the pH 3.5 to 4.0 range. There remains some diversityamongst the 20 randomly chosen clones obtained from the pH 2.0 fractionof the third passage of the MYMUT library and these clones might exhibitdifferent affinities for HNE.

c) Sequences of the EpiNE Clones

The DNA sequences encoding the P1 regions of the different EpiNE cloneswere determined by dideoxy sequencing of Rf DNA. The sequences are shownin Table 208. Essentially, only the codons targeted for mutagenesis(i.e. 15 to 19) were altered as a consequence of cassette mutagenesisusing the MYMUT oligonucleotide. Only 1 codon outside the target regionwas found to contain an unexpected alteration. In this case, codon 21 ofEpiNE8 was altered from a tyrosine codon (TAT) to a SER codon (TCT) by asingle nucleotide substitution. This error could have been introducedinto the MYMUT oligonucleotide during its synthesis. Alternatively, anerror could have been introduced when the single-stranded MYMUToligonucleotide was converted to the double-stranded form by Sequenase.Regardless of the reason, the error rate is extremely low consideringonly 1 unexpected alteration was observed after sequencing 20 codons in19 different clones. Furthermore, the value of such a mutation is notdiminished by its accidental nature.

Some of the EpiNE clones are identical. The sequences of EpiNE1, EpiNE3,and EpiNE7 appear a total of 4, 6 and 5 times respectively. Assuming the1745 potentially different DNA sequences encoded by the MYMUToligonucleotide were present at equal frequency in the fusion phagelibrary, the frequent appearance of the sequences for clones EpiNE1,EpiNE3, and EpiNE7 may have important implications EpiNE1, EpiNE3, andEpiNE7 fusion phage may display BPTI variants with the highest affinityfor HNE of all the 1000 potentially different BPTI variants in the MYMUTlibrary.

An examination of the sequences of the EpiNE clones is illuminating. Astrong preference for either VAL or ILE at the P1 position (residue 15)is indicated with VAL being favored over ILE by 14 to 6. In the MYMUTlibrary, VAL at position 15 is approximately twice as prevalent as ILE.No examples of LEU, PHE, or MET at the P1 position were observedalthough the MYMUT oligonucleotide has the potential to encode theseresidues at Pl. This is consistent with the observation that BPTIvariants with single amino acid substitutions of LEU, PHE, or MET forLYS₁₅ exhibit a significantly lower affinity for HNE than theircounterparts containing either VAL or ILE (BECK88b).

PHE is strongly favored at position 17, appearing in 12 of 20 codons.MET is the second most prominent residue at this position but it onlyappears when VAL is present at position 15. At position 18 PHE wasobserved in all 20 clones sequenced even though the MYMUToligonucleotide is capable of encoding other residues at this position.This result is quite surprising and could not be predicted from previousmutational analysis of BPTI, model building, or on any theoreticalgrounds. We infer that the presence of PHE at position 18 significantlyenhances the ability each of the EpiNEs to bind to HNE. Finally atposition 19, PRO appears in 10 of 20 codons while SER, the second mostprominent residue, appears at 6 of 20 codons. Of the residues targetedfor mutagenesis in the present study, residue 19 is the nearest to theedge of the interaction surface of a PEPI with HNE. Nevertheless, apreponderance of PRO is observed and may indicate that PRO at 19, likePHE at 18, enhances the binding of these proteins to HNE. Interestingly,EpiNE5 appears only once and differs from EpiNE1 only at position 19;similarly, EpiNE6 differs from EpiNE3 only at position 19. Thesealterations may have only a minor effect on the ability of theseproteins to interact with HNE. This is supported by the fact that the pHelution profiles for EpiNE5 and EpiNE6 are very similar to those ofEpiNE1and EpiNE3 respectively.

Only EpiNE2 and EpiNE8 exhibit pH profiles which differ from those ofthe other selected clones. Both clones contain LYS at position 19 whichmay restrict the interaction of BPTI with HNE. However, we can notexclude the possibility that other alterations within EpiNE2 and EpiNE8(R15L and Y21S respectively) influence their affinity for HNE.

EpiNE7 was expressed as a soluble protein and analyzed for HNEinhibition activity by the fluorometric assay of Castillo et al.(CAST79); the data were analyzed by the method of Green and Work(GREE53). Preliminary results indicate that K_(d) (HNE,EpiNE7)≦8.·10⁻¹²M, i.e. at least 7.5-fold lower than the lowest Kd reported for a BPTIderivative with restect to HNE.

C. Summary

Taken together, these data show that the alterations which appear in theP1 region of the EPI mutants confer the ability to bind to HNE and hencebe selected through the fractionation process. That the sequences ofEpiNE1, EpiNE3, and EpiNE7 appear frequently in the population ofselected clones suggests that these clones display BPTI variants withthe highest affinity for HNE of any of the 1000 potentially differentvariants in the MYMUT library. Furthermore, that pH conditions less than4.0 are required to elute these fusion phage from immobilized HNEsuggests that they display BPTI variants having a higher affinity forHNE than BPTI(K15V,R17L). EpiNE7 exhibits a lower K_(d) toward HNE thandoes BPTI(K15V,R17L); EpiNE1 and EpiNE3 should are also expected toexhibit lower K_(d) s for HNE than BPTI(K15V,R17L). It is possible thatall of the listed EpiNEs have lower K_(d) s than BPRI(K15V,R17L).

Position 18 has not previously been identified as a key position indetermining specificity or affinity of aprotinin homologues orderivatives for particular serine proteases. None have reported orsuggested that phenylalanine at position 18 will confer specificity andhigh affinity for HNE. One of the powerful advantages of the presentinvention is that many diverse amino-acid sequences may be testedsimultaneously.

EXAMPLE V SCREENING OF THE MYMUT LIBRARY FOR BINDING TO CATHEPSIN GBEADS

We fractionated the MYMUT library over immobilized human Cathepsin G tofind an engineered protease inhibitor having high affinity for CathepsinG, hereafter designated as an EpiC. The details of phage binding,elution of bound phage with buffers of decreasing pH (pH profile),titering of the phage contained in these fractions, composition of theMYMUT library, and the preparation of cathepsin G (Cat G) beads areessentially the same as detailed in Example IV.

A pH profile for the binding of two starting controls, BPTI-III MK andEpiNE1, are shown in FIG. 10. BPTI-III MK phage, which contains wildtype BPTI fused to the III gene product, shows no apparent binding toCat G beads in this assay. EpiNE1phage was obtained by enrichment withHNE beads (Example IV and Table 208). EpiNE1-III MK demonstrated littlebinding to Cat G beads in the assay, although a small peak or shoulderis visible in the pH 5 eluted fraction.

FIG. 11 shows the pH profiles of the MYMUT library phage when bound toCat G beads. Library-Cat G interaction was monitored using three cyclesof binding, pH elution, transduction of the pH 2 eluted phage, growth ofthe transduced phage and rebinding of any selected phage to Cat G beads,in an exact copy of that used to find variants of BPTI which bound toHNE. In contrast to the pH profiles elicited with HNE beads, littleenhancement of binding was observed for the same phage library whencycled with Cat G beads (with the exception of a possible `shoulder`developing in the pH5 elutions).

To investigate the elution profile around the pH 5 point in more detail,the binding of phage taken from the pH 4 eluted fraction (bound to Cat Gbeads) rather than the previously used pH 2 fraction was examined. FIG.12 demonstrates a marked enhancement of phage binding to the Cat G beadswith an apparent elution peak of pH 5. The binding, as a fraction of theinput phage population, increased with subsequent binding and elutioncycles.

Individual phage clones were picked, grown and analyzed for binding toCat G beads. FIG. 13 shows the binding and pH profiles for theindividual Cat G binding clones (designated EpiC variants). All clonesexhibited minor peaks, superimposed upon a gradual fall in bound phage,at pH elutions of 5 (clones 1 SEQ ID NO: 54 and 117, 8, SEQ ID NO: 56and 118 10 SEQ ID NO: 57 and 120 and 11) SEQ ID NO: 54 and 117, or pH4.5 (clone 7)(SEQ ID NO: 55 and 118).

DNA sequencing of the EpiC clones, shown in Table 209, demonstrated thatthe clones selected for binding to Cat G beads represented a distinctsubset of the available sequences in the MYMUT library and a cluster ofsequences different from that obtained when enriched with HNE beads. TheP1 residue in the EpiC mutants is predominantly MET, with one example ofPHE, while in BPTI it is LYS and in the EpiNE variants it is either VALor LEU. In the EpiC mutants residue 16 is predominantly ALA with oneexample of GLY and residue 17 is PHE, ILE or LEU. Interestingly residues16 and 17 appear to pair off by complementary size, at least in thissmall sample. The small GLY residue pairs with the bulky PHE while therelatively larger ALA residue pairs with the less bulky LEU and ILE. Themajority of the available residues in the MYMUT library for positions 18and 19 are represented in the EpiC variants.

Hence, a distinct subset of related sequences from the MYMUT libraryhave been selected for and demonstrated to bind to Cat G. A comparisonof the pH profiles elicited for the EpiC variants with Cat G and theEpiNE variants for HNE indicates that the EpiNE variants have a highaffinity for HNE while the EpiC variants have a moderate affinity forCat G. Nonetheless, the starting molecule, BPTI, has virtually nodetectable affinity for Cat G and the selection of clones with amoderate affinity is a significant finding.

EXAMPLE VI SECOND ROUND OF VARIEGATION OF EpiNE7 TO ENHANCE BINDING TOHNE A. MUTAGENESIS OF EpiNE7 PROTEIN IN THE LOOP COMPRISING RESIDUES34-41

In Example IV, we described engineered protease inhibitors EpiNE1through EpiNE8 (SEQ ID NO: 46 through 53) that were obtained by affinityselection. Modeling of the structure of the BPTI-Trypsin complex(Brookhaven Protein Data Bank entry 1TPA) indicates that the EpiNEprotein surface that interacts with HNE is formed not only by residues15-19 but also by residues 34-40 that are brought close to this primaryloop when the protein folds (HUBE74, HUBE75, OAST88). Acting upon thisassumption, we changed amino acid residues in a second loop of theEpiNE7 protein to find EpiNE7 (SEQ ID NO: 48) derivatives having higheraffinity for HNE.

In the complex of BPTI and trypsin found in Brookhaven Protein Data Bankentry 1TPA ("1TPA complex"), VAL₃₄ contacts TYR₁₅₁ and GLN₁₉₂. (Residuesin trypsin or HNE are underscored to distinguish them from theinhibitor.) In HNE, the corresponding residues are ILE₁₅₁ and PHE₁₉₂.ILE is smaller and more hydrophobic than TYR. PHE is larger and morehydrophobic than GLN. Neither of the HNE side groups have thepossibility to form hydrogen bonds. When side groups larger than that ofVAL are substituted at position 34, interactions with residues otherthan 151 and 192 may be possible. In particular, an acidic residue at 34might interact with ARG₁₄₇ of HNE that corresponds to SER₁₄₇ of trypsinin 1TPA. Table 15 shows that, in 59 homologues of BPTI, 13 differentamino acids have been seen at position 34. Thus we allow all twentyamino acids at 34.

Position 36 is not highly varied; only GLY, SER, and ARG have beenobserved with GLY by far the most prevalent. In the 1TPA complex, GLY₃₆contacts HIS₅₇ and GLN₁₉₂. HIS₅₇ is conserved and GLN₁₉₂ corresponds toPHE₁₉₂ of HNE. Adding a methyl group to GLY₃₆ could increase hydrophobicinteractions with PHE₁₉₂ of HNE. GLY₃₆ is in a conformation that mostamino acids can achieve: φ=-79° and φ=-9° (Deisenhoffer cited in CREI84,p.222.).

In the 1TPA complex, ARG₃₉ contacts SER₉₆, ASN₉₇, THR₉₈, LEU₉₉ (SEQ IDNO: 13), GLN₁₇₅, and TRP₂₁₅. In HNE, all of the corresponding residuesare different! SER₉₆ is deleted; ASN₉₇ corresponds to ASP₉₇ (bearing anegative charge); THR₉₈ corresponds to PRO₉₈ ; LEU₉₉ corresponds to theresidues VAL₉₉, ASN_(99a), and LEU_(99b) ; GLN175 is deleted; and TRP₂₁₅corresponds to PHE₂₁₅. Position 39 shows a moderately high degree ofvariability with 7 different amino acids observed, viz. ARG, GLY, LYS,GLN, ASP, PRO, and MET. Having seen PRO (the most rigid amino acid), GLY(the most flexible amino acid), LYS and ASP (basic and acidic aminoacids), we assume that all amino acids are structurally compatible withthe aprotinin backbone. Because the context of residue 39 has changed somuch, we allow all 20 amino acids.

Position 40 is not highly variable; only GLY and ALA have been observed(with similar frequency, 24:16). Position 41 is moderately varied,showing ASN, LYS, ASP, GLN, HIS, GLU, and TYR. The side groups ofresidues 40 and 41 are not thought to contact trypsin in the 1TPAcomplex. Nevertheless, these residues can exert electrostatic effectsand can influence the dynamic properties of residues 39, 38, and others.The choice of residues 34, 36, 39, 40, and 41 to be variedsimultaneously illustrates the rule that the varied residues should beable to touch one molecule of the target material at one time or be ableto influence residues that touch the target. These residues are notcontiguous in sequence, nor are they contiguous on the surface ofEpiNE7. They can, nonetheless, all influence the contacts between theEpiNE and HNE.

Amino acid residues VAL₃₄, GLY₃₆, MET₃₉, GLY₄₀, and ASN₄₁ werevariegated as follows: any of 20 genetically encodable amino acids atpositions 34 and 39 (NNS codons in which N is approximately equimolarA,C,T,G and S is approximately equimolar C and G), GLY or ALA atposition 36 and 40 (GST codon), and [ASP, GLU, HIS, LYS, ASN, GLN, TYR,or stop] at position 41 (NAS codon). Because the PEPIs are displayedfused to gIII protein, DNA containing stop codons will not give rise toinfectuous phage in non-suppressor hosts.

For cassette mutagenesis, a 61 base long oligonucleotide DNA populationwas synthesized that contained 32,768 different DNA sequences coding onexpression for a total of 11,200 amino acid sequences Thisoligonucleotide extends from the third base of codon 51 in Table 113(the middle of the StuI site) to base 2 of codon 70 (the EagI site(identified as XmaIII in Table 113)).

We used a mutagenesis method similar to that described by Cwirla et al.(CWIR90) and other standard DNA manipulations described in Maniatis etal. (MANI82) and Sambrook et al. (SAMB89). EpiNE7 RF DNA was restrictedwith EagI and StuI, agarose gel purified, and dephosphorylated usingHK(™) phosphatase (Epicentre Technologies). We prepared insert byannealing two small, 16 base and 17 base, phosphorylated synthetic DNAprimers to the phosphorylated 61 base long oligonucleotide populationdescribed above. The resulting insert DNA population had the followingfeatures: double stranded DNA ends capable of regenerating upon ligationthe EagI (5' overhang) and StuI (blunt) restricted sites of the EpiNE7RF DNA, and single stranded DNA in the central mutagenic region. Insertand EpiNE7 vector DNA were ligated. Ligation samples were used totransfect competent XL1-Blue(™) cells which were subsequently plated forformation of ampicillin resistant (Ap^(R)) colonies. The resultingphage-producing, Ap^(R) colonies were harvested and recombinant phagewas isolated. By following these procedures, a phage library of 1.2·10⁵independent transformants was assembled. We estimated that 97.4% of theapproximately 3.3·10⁴ possible DNA sequences were represented:

    0.974=(1-exp{-1.2·10.sup.5 /32768})

The probability of observing the parental sequence is higher than 0.974because VAL occurs twice in the NNS codon: ##EQU4## Furthermore, weexpect that a small amount (for example, 1 part in 1000) of uncut oronce-cut and religated parental vector would come through the proceduresused. Thus the parental sequence is almost certainly present in thelibrary. This library is designated the KLMUT library.

B. AFFINITY SELECTION WITH IMMOBILIZED HUMAN NEUTROPHIL ELASTASE 1)First Fractionation

We added 1.1·10⁸ plaque forming units of the KLMUT library to 10 μl of a50% slurry of agarose-immobilized human neutrophil elastase beads (HNEfrom Calbiochem cross-linked to Reacti-Gel(™) agarose beads from PierceChemical Co. following manufacturer's directions) in TBS/BSA. Following3 hours incubation at room temperature, the beads were washed and phagewas eluted as done in the selection of EpiNE phage isolates (ExampleIV). The progression in lowering pH during the elution was: pH 7.0, 6.0,5.0, 4.5, 4.0, 3.5, 3.0, 2.5, and 2.0. Beads carrying phage remainingafter pH 2.0 elution were used to infect XL1-Blue(™) cells that wereplated to allow plaque formation. The 348 resulting plaques were pooledto form a phage population for further affinity selection. A populationof phage particles containing 6.0·10⁸ plaque forming units was added to10 μl of a 50% slurry of agarose-immobilized HN beads in TBS/BSA and theabove selection procedure was repeated.

Following this second round of affinity selection, a portion of thebeads was mixed with XL1-Blue(™) cells and plated to allow plaqueformation. Of the resulting plaques, 480 were pooled to form a phagepopulation for a third affinity selection. We repeated the selectionprocedure described above using a population of phage particlescontaining 3.0·10⁹ plaque forming units. Portions of the pH 2.0 eluateand of the beads were plated with XL1-Blue(™) cells to allow formationof plaques. Individual plaques were picked for preparation of RF DNA.From DNA sequencing, we determined the amino acid sequence in themutated secondary loop of 15 EpiNE7-homolog clones. The sequences aregiven in Table 210 as EpiNE7.1 through EpiNE7.20 (SEQ ID NO: 59 through70). Three sequences were observed twice: EpiNE7.4 and EpiNE7.14; (SEQID NO: 63) EpiNE7.8 and EpiNE7.9; (SEQ ID NO: 60) and EpiNE7.10 andEpiNE7.20. (SEQ ID NO: 65) EpiNE7.4 was eluted at pH 2 while EpiNE7.14was obtained by culturing HNE beads that had been washed with pH 2buffer. Similarly, EpiNE7.10 came from pH 2 elution but EpiNE7.20 camefrom beads. EpiNE7.8 and EpiNE7.9 both came from pH 2 elution.Interestingly, EpiNE7.8 is found in both the first and secondfractionations (EpiNE7.31 (vide infra)).

2) Second Fractionation

The purpose of affinity fractionation is to reduce diversity on thebasis of affinity for the target. The first enrichment step of the firstfractionation reduced the population from 3·10⁴ possible DNA sequencesto no more than 348. This might be too severe and some of the loss ofdiversity might not be related to affinity. Thus we carried out a secondfractionation of the entire KLMUT library seeking to reduce thediversity more gradually.

We added 2.0·10¹¹ plaque forming units of the KLMUT library to 10 μl ofa 50% slurry of agarose-immobilized HNE beads in TBS/BSA. Following 3hours incubation at room temperature, phage were eluted as describedabove. We then transduced XL1-Blue(™) cells with portions of the pH 2.0eluate and plated for Ap^(R) colonies.

The resulting phage-producing colonies were harvested to obtainamplified phage for further affinity selection. A population of thesephage particles containing 2.0·10¹⁰ plaque forming units was added to 10μl of a 50% slurry of agarose-immobilized HNE beads in TBS/BSA andincubated for 90 minutes at room temperature. Phage were eluted asdescribed above and portions of the pH 2.0 eluate were used to transduceXL1-Blue(™) cells. We plated the transductants for Ap^(R) colonies andobtained amplified phage from the harvested colonies.

In a third round of affinity selection, a population of phage particlescontaining 3.0·10¹⁰ plaque forming units was added to 20 μl of 50%slurry of agarose-immobilized HNE beads and incubated for 2 hours atroom temperature. We eluted the phage with the following pH washes: pH7.0, 6.0, 5.0, 4.5, 4.0, 3.5, 3.25, 3.0, 2.75, 2.5, 2.25, and 2.0. Afterplating a portion of the pH 2.0 eluate fraction for plaque formation, wepicked individual plaques for preparation of RF DNA. DNA sequencingyielded the amino acid sequence in the mutated secondary loop for 20EpiNE7 homolog clones. These sequences, together with EpiNE7, (SEQ IDNO: 48) are given in Table 210 as EpiNE7.21 through EpiNE7.40. (SEQ IDNO: 7 through 87) The plaques observed when EpiNEs are plated display avariety of sizes. EpiNE7.21 through EpiNE7.30 (SEQ ID NO: 71 through )were picked with attention to plaque size: 7.21, 7.22, and 7.23 fromsmall plaques, 7.24 through 7.30 from plaques of increasing size, with7.30 coming from a large plaque. TRP occurs at position 39 in EpiNE7.21,7.22, 7.23, 7.25, and 7.30. Thus plaque size does not correlate with theappearance of TRP at 39. One sequence, EpiNE7.31, from thisfractionation is identical to sequences EpiNE7.8 and EpiNE7.9 obtainedin the first fractionation. EpiNE7.30, EpiNE7.34, and EpiNE7.35 areidentical, indicating that the diversity of the library has been greatlyreduced. It is believed that these sequences have an affinity for HNEthat is at least comparable to that of EpiNE7 and probably higher.Because the parental EpiNE7 sequence did not recur, it is quite likelythat some or all of the EpiNE7-.nn derivatives have higher affinity forHNE than does EpiNE7.

3) Conclusions

One can draw some conclusions. First, because some sequences have beenisolated repeatedly, the fractionation is nearly complete. The diversityhas been reduced from ≧10⁴ to a few tens of sequences.

Second, the parental sequence has not recurred. At 39, MET did notoccur! At position 34 VAL occurred only once in 35 sequences. At 41, ASNoccurred only 4 of 35 times. At 40, GLY occurred 17 of 35 times. Atposition 36, GLY occurred 34 of 35 times, indicating that ALA isundesirable here. EpiNE7.24 (SEQ ID NO: 74) and EpiNE7.36 (SEQ ID NO:83) are most like EpiNE7, having three of the varied residues identicalto EpiNE7.

Third, the results of the first and second fractionation are similar. Inthe second fractionation, the prevalence of TRP at position 39 is moremarked (5/15 in fractionation #1, 14/20 in #2). It is possible that thefirst fractionation lost some high-affinity EPIs through under-sampling.Nevertheless, the first fractionation was clearly quite successful.

Fourth, there are strong preferences at positions 39 and 36 and lesserbut significant preferences at positions 34 and 41 with littlepreference at 40.

Heretofore, no homologues of aprotinin have been reported having ALA at36. In the selected EpiNE7.nn sequences, the preference for GLY over ALAat position 36 is 34:1. This preference is probably not due todifferences in protein stability. The process of the present invention,as applied in the present example, does not select against proteins onthe basis of stability so long as the protein does fold and function atthe temperature used in the procedure. ALA is probably tolerated atposition 36 well enough to allow those proteins having ALA₃₆ to fold andfunction; one example was found having ALA₃₆. It may be relevant thatthe sole sequence having ALA₃₆ also has GLY34 The flexibility of GLY at34 may allow the methyl of ALA at 36 to fit into HNE in a way that isnot possible when other amino acids occupy position 34.

At position 39, all 20 amino acids were allowed, but only seven wereseen. TRP is strongly preferred with 19 occurrences, HIS second with sixoccurences, and LEU third with 5 occurrences. No homologues of aprotininhave been reported having either TRP or HIS at position 39 as are nowdisclosed. Although LEU is represented in the NNS codon thrice, TRP andHIS have but one codon each and their prevalence is surprising. Weconstructed a model having HNE (Brookhaven Protein Data Bank entry 1HNE)and EpiNE7.9 (SEQ ID. NO:60) spatially related as in the 1TPA complex.(The α carbons of HNE of conserved internal residues were superimposedon the corresponding α carbons of trypsin, rms deviation ≈0.5 Å.)Inspection of this model indicates that TRP₃₉ could interact with theloop of HNE that comprises VAL₉₉, ASN_(99a), and LEU_(99b). HIS isobserved in six cases; HIS is hydrophobic, aromatic, and in some wayssimilar to TRP. LEU₃₉ in EpiNE7.5 could also interact with theseresidues if the loop moves a short distance. GLU occurred twice whileLYS, ARG, and GLN occurred once each. In BPTI, the C.sub.α of residue 39is ≈10 Å from the C.sub.α of residue 15 so that TRP₃₉ interacts withdifferent features of HNE than do the amino acids substituted atposition 15. Residue 34 is well separated from each of the residues 15,18, and 39; thus it contacts different features on the HNE surface fromthese residues. Although serine proteases are highly similar near thecatalytic site, the similarity diminishes rapidly outside this conservedregion. The specificity of serine proteases is in fact determined bymore interactions than the P1 residue. To make an inhibitor that ishighly specific to HNE, we must go beyond matching the requirement atP1. Thus, the substitutions at 18 (determined in Example IV), 39, 34,and other non-P1 positions are invaluable in customizing the EpiNE toHNE. When making an inhibitor customized to a different serine protease,it is likely that many, if not all, of these positions will be changedto obtain high affinity and specificity. It is a major advantage of thepresent method that many such derivatives may be tested rapidly.

At position 34, all 20 amino acids were allowed. Fourteen have beenseen. LYS appeared seven times, GLU five times, THR four times, LEUthree times, GLY, ASP, GLN, MET, ASN, and HIS twice each, and ARG, PRO,VAL, and TYR once each. There were no instances of ALA, CYS, PHE, ILE,SER, or TRP. No homologue of aprotinin with GLU, GLY, or MET at 34 hasbeen reported heretofore. Here, as at position 39, the library containsan excess of LEU over LYS and GLU. Thus, we infer that the prevalence ofLYS, GLU, THR, and LEU is related to tighter binding of EpiNEs havingthese amino acids at position 34. The prevalence of LYS is surprising,as there are no acidic groups on HNE in the neighborhood. The N_(zeta)of LYS₃₄ could interact with a main-chain carbonyl oxygen while themethylene groups interact with ILE₁₅₁ and/or PHE₁₉₂. LEU₃₄ couldinteract with and/or ILE₁₅₁ and/or PHE₁₉₂ while GLU₃₄ could interactwith ARG₁₄₇.

There has been little if any enrichment at positions 40 and 41. Alanineis somewhat preferred at 40; ALA:GLY::18:17. Both ALA and GLY have beenreported in aprotinin homologues.

Position 41 shows a preponderance of LYS (12 occurences) and GLU (7),but all eight possibilities have been seen. The overall distribution isLYS¹², GLU⁷, ASP⁴, ASN⁴, GLN³, HIS³, and TYR². Heretofore, no homologuesof aprotinin having GLU, GLN, HIS, or TYR at position 41 have beenreported.

One sequence, EpiNE7.25 (SEQ ID NO:75) contains an unexpected change atposition 47, SER to LEU. Heretofore, all homologues of aprotininreported have had either SER or THR at position 47. The side groups ofSER and THR can form hydrogen bonds to main-chain atoms at the beginningof the short α helix.

The consensus sequence, LYS₃₄ GLY₃₆, TRP₃₉, ALA₄₀, LYS₄₁ was notobserved EpiNE7.23 (SEQ ID NO: 73) is quite close, differing only atposition 40 where the preference for ALA is very, very weak.

We tested EpiNE7.23 (the sequence closest to consensus) against EpiNE7(SEQ ID NO:48) on HNE beads FIG. 16 shows the fractionation of strainsof phage that display these two EpiNEs. Phage that display EpiNE7 areeluted at higher pH than are phage that display EpiNE7.23. Furthermore,more of the EpiNE7.23 phage are retained than of the EpiNE7 phage. Notethe peak at pH 2.25 in the EpiNE7.23 elution. This suggests thatEpiNE7.23 has a higher affinity for HNE than does EpiNE7. In a similarway, we tested EpiNE7.4 (SEQ ID NO:63) and found that it is not retainedon HNE so well as EpiNE7. This is consistent with the fractionation notbeing complete.

Further fractionation, characterization of clonally pure EpiNE7.nnstrains, and biochemical characterization of soluble EpiNE7.nnderivatives will reveal which sequences in this collection have thehighest affinity for HNE.

Fractionation of the library involves a number of factors. Differentialbinding allows phage that display PBDs having the desired bindingproperties to be enriched. Differences in infectivity, plaque size, andphage yield are related to differences in the sequence of the PBDs, butare not directly correlated to affinity for the target. These factorsmay reduce the effectiveness of the desired fractionation. An additionalfactor that may be present is differential abundance of PBD sequences inthe initial library. One step we employ to reduce the effect ofdifferential infectivity is to transduce cells with isolated phagerather than to infect them. In the first fractionation, we did notobtain sufficient material for transduction and so infected cells; thisfractionation was successful. Because the parental sequence, EpiNE7, wasselected for a sequence at residues 15 through 19 that confer highaffinity for HNE, we believe that many, if not most, members of theKLMUT population have significant affinity for HNE. Thus the presentfractionations must separate variants having very high affinity for HNEfrom those merely having high affinity for HNE. It is perhaps relevantthat BPTI-III MK phage are only partially eluted from immobilizedtrypsin at pH 2.2.; Kd(trypsin,BPTI)=6.0·10⁻¹⁴ M. Elution of EpiNE7-IIIMA phage from immobilized HNE gives a peak at about pH 3.5 with somephage appearing at lower pH; K_(d) (HNE,EpiNE7)≦1.·10⁻¹¹ M. We recycledphage that either were eluted at pH 2.0 or that were retained afterelution with pH 2.0 buffer. A large percentage of EpiNE7-III MA phagewould have been washed away with the fractions at pHs less acid than2.0. This, together with the marked preferences at positons 39, 36, and34, strongly sugestes that we have successfully fractionated the KLMUTlibrary on the basis of affinity for HNE and that the EpiNE7.nn proteinshave higher affinity for HNE than does EpiNE7 or any other reportedaprotinin derivative.

Fractionation in a few stringent steps emphasizes the affinity of thePBD and allows isolation of variants that confer a small-plaquephenotype on cells (through low infectivity or by slowing cell growth).More gradual fractionation allows observation of a wider variety ofvariants that show high affinity and favors sequences that start at lowabundance. Gradual fractionation also favors selection of variants thatdo not confer a small-plaque phenotype; such variants may be easier towork with and are preferred for some purposes. In either case, it ispreferred to fractionate until there is a manageable number of distinctisolates and to characterize these isolates as pure clones. Thus, it isdesirable, in most cases, to fractionate a library in more than one way.

None have identified positions 39 and 34 as key in determining theaffinity and specificity of aprotinin homologues and derivatives forparticular serine proteases. None have suggested the tryptophan at 39 orcharged amino acids (LYS or GLU) at 34 will enhance binding of anaprotinin homologue to HNE. Different substitutions at these positionsis likely to confer different specificity on those derivatives. One ofthe major advantages of the present invention is that many substitutionsat several locations may be tested with an amount of effort not muchgreater than is required to test a single derivative by previously usedmethods.

There exist a number of proteases produced by lymphocytes. Neutrophilelastase is not the only lymphocytic protease that degrades elastin. Theprotease p29 is related to HNE. Screening the MYMUT and KLMUT librariesagainst immobilized p29 is likely to allow isolation of an aprotininderivative having high affinity for p29.

EXAMPLE VII BPTI:VIII BOUNDARY EXTENSIONS

The aim of this work was to introduce peptide extensions between theC-terminus of the BPTI domain and the N-terminus of the M13 major coatprotein within the fusion protein. The reasons for this were two fold;firstly to alter potential protease cleavage sites at the interdomainboundary (as evidenced by an apparent instability of the fusion protein)and secondly to increase interdomain flexibility.

1 Insertion of a variegated pentapeptide at the BPTI:VIII interface

The gene shown in Table 113 was modified by insertion of five RVT codonsbetween codon 81 and 82. Two synthetic oligonucleotides were designedand custom synthesized. The first consisted of, from 5' to 3': a) frombase 2 of codon 77 to the end of codon 81, b) five copies of RVT, and c)from codon 82 to the second base of codon 94. The second comprised 20bases complementary to the 3' end of the first oligonucleotide Each RVTcodon allows one of the amino acids [T, N, S, A, D, and G] to beencoded. This variegation codon was picked because: a) each amino acidoccurs once, and b) all these amino acids are thought to foster aflexible linker. When annealed, the primed variegated oligonucleotidewas converted to doublestranded DNA using standard methods.

The duplex was digested with restriction enzymes SfiI and NarI and theresulting 45 base-pair fragment was ligated into a similarly cleavedOCV, M13MB48 (Example I.1.iii.a). The ligated material was transfectedinto competent E. coli cells (strain XL1-Blue™) and plated onto a lawnof the same cells on normal bacterial growth plates to form plaques. Thebacteriophage contained within the plaques were analyzed using standardmethods of nitrocellulose lifts and probing using a ³² P-labeledoligonucleotide complementary to the DNA sequence encoding the fusionprotein interface. Approximately 80% of the plaques probed poorly withthis oligonucleotide and hence contained new sequences at this position.

A pool of phages, containing the novel interface pentapeptideextensions, was collected by combining the phage extracted from theplated plaques.

2. Adding multiple unit extensions to the fusion protein interface.

The M13 gene III product contains `stalk-like` regions as implied byelectron micrographic visualization of the bacteriophage (LOPE85). Thepredicted amino acid sequence of this protein contains repeating motifs,which include:

glu.gly.gly.gly.ser (EGGGS) seven times (SEQ ID No.10)

gly.gly.gly.ser (GGGS) three times SEQ ID NO:14

glu.gly.gly.gly.thr (EGGGT) once. SEQ ID NO:15

The aim of this section was to insert, at the domain interface, multipleunit extensions which would mirror the repeating motifs observed in theIII gene product.

Two synthetic oligonucleotides were designed and custom synthesized. GLYis encoded by four codons (GGN); when translated in the oppositedirection, these codons give rise to THR, PRO, ALA, and SER. The thirdbase of these codons was picked so that translation of theoligonucleotide in the opposite direction would encode SER. Whenannealed the synthetic oligonucleotides give the following unit duplexsequence (an EGGGS linker): ##STR27##

The duplex has a common two base pair 5'overhang (GC) at either end ofthe linker which allows for both the ligation of multiple units and theability to clone into the unique NarI recognition sequence present inOCV's M13MB48 and Gem MB42. This site is positioned within 1 codon ofthe DNA encoding the interface. The cloning of an EGGGS linker (SEQ IDNO:10) (or multiple linker) into the vector NarI site destroys thisrecognition sequence. Insertion of the EGGGS linker in reverseorientation leads to insertion of GSSSL (SEQ ID NO:16) into the fusionprotein.

Addition of a single EGGGS linker at the NarI site of the gene shown inTable 113 leads to the following gene: ##STR28##

Note that there is no preselection for the orientation of the linker(s)inserted into the OCV and that multiple linkers of either orientation(with the predicted EGGGS or GSSSL amino acid sequence) or a mixture oforientations (inverted repeats of DNA) could occur.

A ladder of increasingly large multiple linkers was established byannealing and ligating the two starting oligonucleotides containingdifferent proportions of 5' phosphorylated and non-phosphorylated ends.The logic behind this is that ligation proceeds from the 3'unphosphorylated end of an oligonucleotide to the 5' phosphorylated endof another. The use of a mixture of phosphorylated andnon-phosphorylated oligonucleotides allows for an element of controlover the extent of multiple linker formation. A ladder showing a rangeof insert sizes was readily detected by agarose gel electrophoresisspanning 15 bp (1 unit duplex-5 amino acids) to greater than 600 basepairs (40 ligated linkers-200 amino acids).

Large inverted repeats can lead to genetic instability. Thus we chose toremove them, prior to ligation into the OCV, by digesting the populationof multiple linkers with the restriction enzymes AccIII or XhoI, sincethe linkers, when ligated `head-to-head` or `tail-to-tail`, generatethese recognition sequences. Such a digestion significantly reduces therange in sizes of the multiple linkers to between 1 and 8 linker units(i.e. between 5 and 40 amino acids in steps of 5), as assessed byagarose gel electrophoresis.

The linkers were ligated (as a pool of different insert sizes or asgel-purified discrete fragments) into NarI cleaved OCVs M13MB48 orGemMB42 using standard methods. Following ligation the restrictionenzyme NarI was added to remove the self-ligating starting OCV (sincelinker insertion destroys the NarI recognition sequence). This mixturewas used to transform competent XL-1 blue cells and appropriately platedfor plaques (OCV M13MB48) or ampicillin resistant colonies (OCVGemMB42).

The transformants were screened using dot blot DNA analysis with one oftwo ³² P labeled oligonucleotide probes. One probe consisted of asequence complementary to the DNA encoding the Pl loop of BPTI while thesecond had a sequence complementary to the DNA encoding the domaininterface region. Suitable linker candidates would probe positively withthe first probe and negatively or poorly with the second. Plaquepurified clones were used to generate phage stocks for binding analysesand BPTI display while the Rf DNA derived from phage infected bacterialcells was used for restriction enzyme analysis and sequencing.Representative insert sequences of selected clones analyzed are asfollows: ##STR29## These highly flexible oligomeric linkers are believedto be useful in joining a binding domain to the major coat (gene VIII)protein of filamentous phage to facilitate the display of the bindingdomain on the phage surface. They may also be useful in the constructionof chimeric OSPs for other genetic packages as well.

EXAMPLE VIII BACTERIAL EXPRESSION VECTORS

The expression vectors were designed for the bacterial production ofBPTI analogues resulting from the mutagenesis and screening for variantswith specific binding properties. The expression vectors used arederivatives of the OCV's M13MB48 and GemMB42. The conversion wasachieved by replacing the first codon of the mature VIII gene (codon 82as shown in Table 113) with a translational stop codon by site specificmutagenesis.

The salient points of the expression vector composition are identical tothat of the parent OCV's, namely a lacUV5 promoter (hence IPTGinduction), ribosome binding site, initiating methionine, pho A signalpeptide and transcriptional termination signal (see Table 113). Theplacement of the stop codon allows for the expression of only the firsthalf the fusion protein. The Gem-based expression system, containing thegenes encoding BPTI analogues, is stored as plasmid DNA, being freshlytransfected into cells for expression of the analogue protein. TheM13-based expression system is stored as both RF DNA and as phagestocks. The phage stocks are used to infect fresh bacterial cells forexpression of the protein of interest.

Bacterial Expression of BPTI and Analogues i. Gem-based expressionvector and protocol

The gem-based expression vector is a derivative of the OCV GemMB42(Example I and Table 113). This vector, at least when it contains theBPTI or analogue genes, has demonstrated a degree of insert instabilityon prolonged growth in liquid culture. To reduce the risk of this thefollowing protocol is used.

Expression vector DNA (containing the BPTI or analogue gene) istransfected into the E. coli strain, XL1-Blue™, which is plated onbacterial plates containing ampicillin and allowed to incubate overnightat 37° C. to give a dense population of colonies. The colonies arescraped from the plate with a glass spreader in 1 ml of NZCYM medium andcombined with the scraped cells from other duplicate plates. This stockof cells is diluted approximately one hundred fold into NZCYM liquidmedium containing ampicillin (100 μg per ml) and allowed to grow in ashaking incubator to a cell density of approximately half log(absorbance of 0.3 at 600nm). IPTG is added to a final concentration of0.5 mM and the induced culture allowed to grow for a further two hourswhen it is processed as described below.

ii. M13-based expression vector and protocol

The M13-based expression vector is derived from OCV M13MB48 (Example I).The BPTI gene (or analogue) is contained within the intergenic regionand its transcription is under the control of a lacUV5 promoter, henceIPTG inducible. The expression vector, containing the gene of interest,is maintained and utilized as a phage stock. This method enables apotentially lethal or deleterious gene to be supplied to a bacterialculture and gene induction to occur only when the bacterial culture hasachieved sufficient mass. Poor growth and insert instability can becircumvented to a large extent, giving this system an advantage over theGem-based vector described above.

An overnight bacterial culture of XL1-Blue™ or SEF' is grown in LBmedium containing tetracycline (50 μg per ml) to ensure the presence ofpili as sites for bacteriophage binding and infection. This culture isdiluted 100-fold into NZCYM medium containing tetracycline and bacterialgrowth allowed to proceed in an incubator shaker until a cell density of1.0 (Ab 600nm) has been achieved. Phage, containing the expressionvector and gene of interest, are added to the bacterial culture at amultiplicity of infection (MOI) of 10 and allowed to infect the cellsfor 30 minutes. Gene expression is then induced by the addition of IPTGto a final concentration of 0.5 mM and the culture allowed to growovernight. Media collection and cell fractionation is as describedelsewhere.

Bacterial Cell Fractionation

After heterologous gene expression the bacterial cell culture can beseparated into the following fractions: conditioned medium, periplasmicfraction and post-periplasmic cell lysate. This is achieved using thefollowing procedures.

The culture is centrifuged to pellet the bacteria, allowing thesupernatant to be stored as conditioned medium. This fraction containsany exported proteins. The pellet is taken up in 20% sucrose, 30 mM TrispH 8 and 1 mM EDTA (80 ml of buffer per gram of fresh weight pellet) andallowed to sit at room temperature for 10 minutes. The cells arerepelleted and taken up in the same volume of ice cold 5 mM MgSO₄ andleft on ice for 10 minutes. Following centrifugation, to pellet thecells, the supernatant (periplasmic fraction) is stored. A second roundof osmotic shock fractionation can be undertaken if desired.

The post-periplasmic pellet can be further lysed as follows. The pelletis resuspended in 1.5 ml of 20% sucrose, 40 mM Tris pH 8, 50 mM EDTA and2.5 mg of lysozyme (per gram fresh weight of starting pellet). After 15minutes at room temperature 1.15 ml of 0.1% Triton X is added togetherwith 300 μl of 5M NaCl and incubated for a further 15 minutes. 2.5 ml of0.2 M triethanolamine (pH 7.8), 150 μl of 1M CaCl₂, 100 μl of 1M MgCl₂and 5 μg of DNA'se are added and allowed to incubate, with end-over-endmixing, for 20 minutes to reduce viscosity. This is followed bycentrifugation with the supernatant being retained as thepost-periplasmic lysate.

The present invention is not, of course, limited to any particularexpression system, whether bacterial or not.

EXAMPLE IX CONSTRUCTION OF AN ITI-DOMAIN I/GENE III DISPLAY VECTOR 1.ITI domain I as an IPBD

Inter-α-trypsin inhibitor (ITI) is a large (M_(r) ca 240,000)circulating protease inhibitor found in the plasma of many mammalianspecies (for recent reviews see ODOM90, SALI90, GEBH90, GEBH86). Theintact inhibitor is a glycoprotein and is currently believed to consistof three glycosylated subunits that interact through a strongglycosaminoglycan linkage (ODOM90, SALI90, ENGH89, SELL87). Theanti-trypsin activity of ITI is located on the smallest subunit (ITIlight chain, unglycosylated M_(r) ca 15,000) which is identical in aminoacid sequence to an acid stable inhibitor found in urine (UTI) and seru(STI) (GEBH86, GEBH90). The mature light chain consists of a 21 residueN-terminal sequence, glycosylated at SER₁₀, followed by two tandemKunitz-type domains the first of which is glycosylated at ASN₄₅(ODOM90). In the human protein, the second Kunitz-type domain has beenshown to inhibit trypsin, chymotrypsin, and plasmin (ALBR83a, ALBR83b,SELL87, SWAI88). The first domain lacks these activities but has beenreported to inhibit leukocyte elastase (10⁻⁶ >K_(i) >10⁻⁹) (ALBR83a,b,ODOM90). cDNA encoding the ITI light chain also codes forα-1-microglobulin (TRAB86, KAUM86, DIAR90); the proteins are separatedpost-translationally by proteolysis.

The N-terminal Kunitz-type of the ITI light chain (ITI-D1, comprisingresidues 22 to 76 of the UTI sequence shown in FIG. 1 of GEBH86)possesses a number of characteristics that make it useful as an IPBD.The domain is highly homologous to both BPTI and the EpiNE series ofproteins described elsewhere in the present application. Although anx-ray structure of the isolated domain is not available,crystallographic studies of the related Kunitz-type domain isolated fromthe Alzheimer's amyloid β-protein (AAβP) precursor show that thispolypeptide assumes a crystal structure almost identical to that of BPTI(HYNE90). Thus, it is likely that the solution structure of the isolatedITI-D1 polypeptide will be highly similar to the structures of BPTI andAAβP. In this case, the advantages described previously for use of BPTIas an IPBD apply to ITI-D1. ITI-D1 provides additional advantages as anIDBP for the development of specific anti-elastase inhibitory activity.First, this domain has been reported to inhibit both leukocyte elastase(ALBR83a,b, ODOM90) and Cathepsin-G (SWAI88, ODOM90); activities whichBPTI lacks. Second, ITI-D1 lacks affinity for the related serineproteases trypsin, chymotrypsin, and plasmin (ALBR83a,b, SWAI88), anadvantage for the development of specificity in inhibition. Finally,ITI-D1 is a human-derived polypeptide so derivatives are anticipated toshow minimal antigenicity in clinical applications.

2. Construction of the display vector

For purposes of this discussion, numbering of the nucleic acid sequencefor the ITI light chain gene is that of TRAB86 and of the amino acidsequence is that shown for UTI in FIG. 1 of GEBH86. DNA manipulationswere conducted according to standard methods as described in SAMB89 andAUSU87.

The protein sequence of human ITI-D1 consists of 56 amino acid residuesextending from LYS₂₂ to ARG₇₇ of the complete ITI light chain sequence.This sequence is encoded by the 168 bases between positions 750 and 917in the cDNA sequence presented in TRAB86. The majority of the domain iscontained between a BglI site spanning bases 663 to 773 and a PstI sitespanning bases 903 to 908. The insertion of the ITI-D1 sequence into M13gene III was conducted in two steps. First a linker containing theappropriate ITI sequences outside the central BglI to PstI region wasligated into the NarI site of phage MA RF DNA. In the second step, theremainder of the ITI-D1 sequence was incorporated into thelinker-bearing phage RF DNA.

The linker DNA consisted of two synthetic oligonucleotides (top andbottom strands) which, when annealed, produced a 54 bp double-strandedfragment with the following structure (5' to 3'):

    NARI OVERHANG/ITI-5'/BGLI/STUFFER/PSTI/ITI-3'/NARI OVERHANG

The NarI OVERHANG sequences provide compatible ends for ligation into acut NarI site. The ITI-5' sequence consists of ds DNA corresponding tothe thirteen positions from A750 to T662 immediately 5' adjacent to theBglI site in the ITI-D1 sequence. Two changes, both silent, areintroduced in this sequence: T to C at position 658 (changes codon forASP₂₄ from GAT to GAC) and G to T at position 661 (changes codon forSER₂₅ from TCG to TCT). The sequences BGLI and PSTI are identical to theBglI and PstI sites, respectively, in the ITI-D1 sequence. The ITI-3'sequence consists of dsDNA corresponding to the nine positions from A909to T917 immediately 3' adjacent to the PstI site in the ITI-D1 sequence.The one base change included in this sequence, A to T at position 917,is silent and changes the codon for ARG₇₇ from CGA to CGT. The STUFFERsequence consists of dsDNA encoding three residues (5' to 3'): LEU(TTA), TRP(TGG), and SER(TCA). The reverse complement of the STUFFERsequence encodes two translation termination codons (TGA and TAA). Phageexpressing gene III containing the linker in opposite orientation tothat shown above will not produce a functional gene III product.

Phage MA RF DNA was digested with NarI and the linear ca. 8.2 kbfragment was gel purified and subsequently dephosphorylated using HKphosphatase (Epicentre). The linker oligonucleotides were annealed toform the linker fragment described above, which was then kinased usingT4 Polynucleotide Kinase. The kinased linker was ligated to theNarI-digested MA RF DNA in a 10:1 (linker:-RF) molar ratio. After 18 hrsat 16° C., the ligation was stopped by incubation at 65° C. for 10 minand the ligation products were ethanol precipitated in the presence of10 μg of yeast tRNA. The dried precipitate was dissolved in 5 μl ofwater and used to transform D1210 cells by electroporation. After 60 minof growth in SOC at 37° C., transformed cells were plated onto LB platessupplemented with ampicillin (Ap, 200 μg/ml). RF DNA prepared fromAP^(r) isolates was subjected to restriction enzyme analysis. The DNAsequences of the linker insert and the immediately surrounding regionswere confirmed by DNA sequencing. Phage strains containing the ITILinker sequence inserted into the NarI site in gene III are calledMA-IL.

Phage MA-IL RF DNA was partially digested with BglI and the ca. 8.2 kblinear fragment was gel purified. This fragment was digested with PstIand the large linear fragment was gel purified. The BglI to PstIfragment of ITI-D1 was isolated from pMGIA (a plasmid carrying thesequence shown in TRAB86). pMGIA was digested to completion with BglIand the ca. 1.6 kb fragment was isolated by agarose gel electrophoresisand subsequent Geneclean (Bio101, La Jolla, CA) purification. Thepurified BglI fragment was digested to completion with PstI and EcoRIand the resulting mixture of fragments was used in a ligation with theBglI and PstI cut MA-IL RF DNA described above. Ligation,transformation, and plating were as described above. After 18 hr. ofgrowth on LB Ap plates at 37° C., Ap^(r) colonies were harvested with LBbroth supplemented with Ap (200 μg/ml) and the resulting cell suspensionwas grown for two hours at 37° C. Cells were pelleted by centrifugation(10 min at 5000xg, 4° C.). The supernatant fluid was transferred tosterile centrifugation tubes and recentrifuged as above. The supernatantfluid from the second centrifugation step was retained as the phagestock POP1.

PCR was used to demonstrate the presence of phage containing thecomplete ITI-D1-III fusion gene. Upstream PCR primers, 1UP and 2UP, arelocated spanning nucleotides 1470 to 1494 and 1593 to 1618 of the phageM13 DNA sequence, respectively. A downstream PCR primer 3DN spansnucleotides 1779 to 1804. Two ITI-D1-specific primers, IAI-1 and IAI-2,are located spanning positions 789 to 810 and 894 to 914, respectively,in the ITI light chain sequence of TRAB86. IAI-1 and IAI-2 are used asdownstream primers in PCR reactions with 1UP or 2UP. IAI-1 is entirelycontained within the BglI to PstI region of the ITI-D1 sequence, whileIAI-2 spans the PstI site in the ITI-D1 sequence. When aliquots of POP1phage were used as substrates for PCR, template-specific products ofcharacteristic size were produced in reactions containing 1UP or 2UPplus IAI-1 or IAI-2 primer pairs. No such products are obtained usingMA-IL phage as template. No PCR products with sizes corresponding tocomplete ITI-D1-gene III templates were obtained using POP1 phage andthe 1UP or 2UP plus 3DN primer pairs. This last result reflects the lowabundance (<1%) of phage containing the complete ITI-D1 sequence inPOP1.

Preparative PCR was used to generate substrate amounts of the 330 bp PCRproduct of a reaction using the 1UP and IAI-2 primer pair to amplify thePOP1 template. The 330 bp PCR product was gel purified and then cut tocompletion with BglI and PstI. The 138 bp BglI to PstI fragment fromITI-D1 was isolated by agarose gel electrophoresis followed by Qiaexextraction (Qiagen, Studio City, CA). MA-IL phage RF DNA was digested tocompletion with PstI. The ca. 8.2 kb linear fragment was gel purifiedand subsequently digested to completion with BglI. The BglI digest wasextracted once with phenol:-chloroform (1:1), the aqueous phase wasethanol precipitated, and the pellet was dissolved in TE (pH8.0). Analiquot of this solution was used in a ligation reaction with the 138 bpBglI to PstI fragment as described above. The ethanol precipitatedligation products were used to transform XL1-Blue™ cells byelectroporation and after 1 hr growth in SOC at 37° C., cells wereplated on LB Ap plates. A phage population, POP2, was prepared fromAp^(r) colonies as described previously.

Phage stocks obtained from individual plaques produced on titration ofPOP2 were tested by PCR for the presence of the complete ITI-D1-III genefusion. PCR results indicate the entire fusion gene was present in sevenof nine isolates tested. RF DNA from the seven isolates testing positivewas subjected to restriction enzyme analysis. The complete sequence ofthe ITI-D1 insertion into gene III was confirmed in four of the sevenisolates by DNA sequence analysis. Phage isolates containing theITI-D1-III fusion gene are called MA-ITI.

3. Expression and display of ITI-D1

Expression of the ITI domain I-Gene III fusion protein and its displayon the surface of phage were demonstrated by Western analysis and phagetiter neutralization experiments.

For Western analysis, aliquots of PEG-purified phage preparationscontaining up to 4·10¹⁰ infective particles were subjected toelectrophoresis on a 12.5% SDS-urea-polyacrylamide gel. Proteins weretransferred to a sheet of Immobilon-P transfer membrane (Millipore,Bedford, MA) by electrotransfer. Western blots were developed using arabbit anti-ITI serum (SALI87) which had previously been incubated withan E. coli extract, followed by goat antirabbit IgG conjugated to horseradish peroxidase (#401315, Calbiochem, La Jolla, Ca). An immunoreactiveprotein with an apparent size of ca. 65-69 kD is detected inpreparations of MA-ITI phage but not with preparations of the parentalMA phage. The size of the immunoreactive protein is consistent with theexpected size of the processed ITI-D1-III fusion protein (ca. 67 kD, aspreviously observed for the BPTI-III fusion protein).

Rabbit anti-BPTI serum has been shown to block the ability of MK-BPTIphage to infect E. coli cells (Example II). To test for a similar effectof rabbit anti-ITI serum on the infectivity of MA-ITI phage, 10 μlaliquots of MA or MA-ITI phage were incubated in 100 μl reactionscontaining 10 μl aliquots of PBS, normal rabbit serum (NRS), or anti-ITIserum. After a three hour incubation at 37° C., phage suspensions weretitered to determine residual plaque-forming activity. These data aresummarized in Table 211. Incubation of MA-ITI phage with rabbit anti-ITIserum reduces titers 10- to 100-fold, depending on initial phage titer.A much smaller decrease in phage titer (10 to 40%) is observed whenMA-ITI phage are incubated with NRS. In contrast, the titer of theparental MA phage is unaffected by either NRS or anti-ITI serum.

Taken together, the results of the Western analysis and the phage-titerneutralization experiments are consistent with the expression of anITI-DI-III fusion protein in MA-ITI phage, but not in the parental MAphage, such that ITI-specific epitopes are present on the phage surface.The ITI-specific epitopes are located with respect to III such thatantibody binding to these epitopes prevents phage from infecting E. colicells.

4. Fractionation of MA-ITI phage bound to aqarose-immobilized proteasebeads

To test if phage displaying the ITI-DI-III fusion protein interactstrongly with the proteases human neutrophil elastase (HNE) orcathepsin-G, aliquots of display phage were incubated withagarose-immobilized HNE or cathepsin-G beads (HNE beads or Cat-G beads,respectively). The beads were washed and bound phage eluted by pHfractionation as described in Examples II and III. The procession inlowering pH during the elution was: pH 7.0, 6.0, 5.5, 5.0, 4.5, 4.0,3.5, 3.0, 2.5, and 2.0. Following elution and neutralization, thevarious input, wash, and pH elution fractions were titered.

The results of several fractionations are summarized in Table 212(EpiNE-7 or MA-ITI phage bound to HNE beads) and Table 213 (EpiC-10 orMA-ITI phage bound to Cat-G beads). For the two types of beads (HNE orCat-G), the pH elution profiles obtained using the control display phage(EpiNE-7 or EpiC-10, respectively) were similar to those seen previously(Examples II and III). About 0.3% of the EpiNE-7 display phage appliedto the HNE beads were eluted during the fractionation procedure and theelution profile had a maximum for elution at about pH 4.0. A smallerfraction, 0.02%, of the EpiC-10 phage applied to the Cat-G beads wereeluted and the elution profile displayed a maximum near pH 5.5.

The MA-ITI phage show no evidence of great affinity for either HNE orcathepsin-G immobilized on agarose beads. The pH elution profiles forMA-ITI phage bound to HNE or Cat-G beads show essentially monotonicdecreases in phage recovered with decreasing pH. Further, the totalfractions of the phage applied to the beads that were recovered duringthe fractionation procedures were quite low: 0.002% from HNE beads and0.003% from Cat-G beads.

Published values of K_(i) for inhibition neutrophil elastase by theintact, large (Mr=240,000) ITI protein range between 60 and 150 nM andvalues between 20 and 6000 nM have been reported for the inhibition ofCathepsin G by ITI (SWAI88, ODOM90). Our own measurements of pH fractionof display phage bound to HNE beads show that phage displaying proteinswith low affinity (>μM) for HNE are not bound by the beads while phagedisplaying proteins with greater affinity (nM) bind to the beads and areeluted at about pH 5. If the first Kunitz-type domain to the ITI lightchain is entirely responsible for the inhibitory activity of ITI againstHNE, and if this domain is correctly displayed on the MA-ITI phage, thenit appears that the minimum affinity of an inhibitor for HNE that allowsbinding and fractionation of display phage on HNE beads is 50 to 100 nM.

5. Alteration of the P1 region of ITI-DI.

If ITI-DI and EpiNE-7 assume the same configuration in solution as BPTI,then these two polypeptides have identical amino acid sequences in boththe primary and secondary binding loops with the exception of fourresidues about the P1 position. For ITI-DI the sequence for positions 15to 20 is (position 15 in ITI-DI corresponds to position 36 in the UTIsequence of GEBH86): MET15, GLY16, MET17, THR18, SER19, ARG20. InEpiNE-7 the equivalent sequence is: VAL15, ALA16, MET17, PHE18, PR019,ARG20. These two proteins appear to differ greatly in their affinitiesfor HNE. To improve the affinity of ITI-DI for HNE, the EpiNE-7 sequenceshown above was incorporated into the ITI-DI sequence at positions 15through 20.

The EpiNE-7 sequence was incorporated into the ITI-DI sequence in MA-ITIby cassette mutagenesis. The mutagenic cassette consisted of twosynthetic 51 base oligonucleotides (top and bottom stands) which wereannealed to make double stranded DNA containing an Eag I overhang at the5' end and a Sty I overhang at the 3' end. The DNA sequence between theEag I and Sty I overhangs is identical to the ITI-DI sequence betweenthese sites except at four codons: the codon for position 15, AT (MET),was changed to GTC (VAL), the codon for position 16, GGA (GLY), waschanged to GCT (ALA), the codon for position 18, ACC (THR) was changedto TTC (PHE), and the codon for position 19, AGC (SER), was changed toCCA (PRO). MA-ITI RF DNA was digested with Eag I and Sty I. The large,linear fragment was gel purified and used in a ligation with themutagenic cassette described above. Ligation products were used totransform XL1-Blue™ cells as described previously. Phage stocks obtainedfrom overnight cultures of Ap^(r) transductants were screened by PCR forincorporation of the altered sequence and the changes in the codons forpositions 15, 16, 18, and 19 were confirmed by DNA sequencing. Phageisolates containing the ITI-DI-III fusion gene with the EpiNE-7 changesaround the P1 position are called MA-ITI-E7.

6. Fractionation of MA-ITI-E7 phage.

To test if the changes at positions 15, 16, 18, and 19 of the ITI-DI-IIIfusion protein influence binding of display phage to HNE beads,abbreviated pH elution profiles were measured. Aliquots of EpiNE-7,MA-ITI, and MA-ITI-E7 display phage were incubated with HNE beads forthree hours at room temperature. The beads were washed and phage wereeluted as described (Example III), except that only three pH elutionswere performed: pH 7.0, 3.5, and 2.0. The results of these elutions areshown in Table 214.

Binding and elution of the EpiNE-7 and MA-ITI display phage were foundto be as previously described. The total fraction of input phages washigh (0.4%) for EpiNE-7 phage and low (0.001%) for MA-ITI phage.Further, the EpiNE-7 phage showed maximum phage elution in the pH 3.5fraction while the MA-ITI phage showed only a monotonic decrease inphage yields with decreasing pH, as seen above.

The two strains of MA-ITI-E7 phage show increased levels of binding toHNE beads relative to MA-ITI phage. The total fraction of the inputphage eluted from the beads is 10-fold greater for both MA-ITI-E7 phagestrains than for MA-ITI phage (although still 40-fold lower that EpiNE-7phage). Further, the pH elution profiles of the MA-ITI-E7 phage strainsshow maximum elutions in the pH 3.5 fractions, similar to EpiNE-7 phage.

To further define the binding properties of MA-ITI-E7 phage, theextended pH fractionation procedure described previously was performedusing phage bound to HNE beads. These data are summarized in Table 215.The pH elution profile of EpiNE-7 display phage is as previouslydescribed. In this more resolved, pH elution profile, MA-ITI-E7 phageshow a broad elution maximum centered around pH 5. Once again, the totalfraction of MA-ITI-E7 phage obtained on pH elution from HNE beads wasabout 40-fold less than that obtained using EpiNE-7 display phage.

The pH elution behavior of MA-ITI-E7 phage bound to HNE beads isqualitatively similar to that seen using BPTI[K15L]-III-MA phage. BPTIwith the K15L mutation has an affinity for HNE of ≈3.·10⁻⁹ M. Assumingall else remains the same, the pH elution profile for MA-ITI-E7 suggeststhat the affinity of the free ITI-DI-E7 domain for HNE might be in thenM range. If this is the case, the substitution of the EpiNE-7 sequencein place of the ITI-DI sequence around the Pl region has produced a 20-to 50-fold increase in affinity for HNE (assuming K_(i) =60 to 150 nMfor the unaltered ITI-DI).

If EpiNE-7 and ITI-DI-E7 have the same solution structure, theseproteins present the identical amino acid sequences to HNE over theinteraction surface. Despite this similarity, EpiNE-7 exhibits a roughly1000-fold greater affinity for HNE than does ITI-DI-E7. Again assumingsimilar structure, this observation highlights the importance ofnon-contacting secondary residues in modulating interaction strengths.

Native ITI light chain is glycosylated at two positions, SER10 and ASN45(GEBH86). Removal of the glycosaminoglycan chains has been shown todecrease the affinity of the inhibitor for HNE about 5-fold (SELL87).Another potentially important difference between EpiNE-7 and ITI-DI-E7is that of net charge. The changes in BPTI that produce EpiNE-7 reducethe total charge on the molecule from +6 to +1. Sequence differencesbetween EpiNE-7 and ITI-DI-E7 further reduce the charge on the latter to-1. Furthermore, the change in net charge between these two moleculesarises from sequence differences occurring in the central portions ofthe molecules. Position 26 is LYS in EpiNE-7 and is THR in ITI-DI-E7,while at position 31 these residues are GLN and GLU, respectively. Thesechanges in sequence not only alter the net charge on the molecules butalso position negatively charged residue close to the interactionsurface in ITI-DI-E7. It may be that the occurrence of a negative chargeat position 31 (which is not found in any other of the HNE inhibitorsdescribed here) destabilized the inhibitor-protease interaction.

EXAMPLE X GENERATION OF A VARIEGATED ITI-DI POPULATION

The following is a hypothetical example demonstating how to obtain aderivative of ITI having high affinity for HNE.

The results of Example IX demonstrate that the nature of the proteinsequence around the P1 position in ITI-DI can significantly influencethe strength of the interaction between ITI-DI and HNE. Whileincorporation of the EpiNE-7 sequence increases the affinity of ITI-DIfor HNE, it is unlikely that this particular sequence is optimal forbinding.

We generate a large population of potential binding proteins havingdiffering sequences in the P1 region of ITI-DI using the oligonucleotideITIMUT. ITIMUT is designed to incorporate variegation in ITI-DI at thesix positions about and including the P1 residue: 13, 15, 16, 17, 18,and 19. ITIMUT is synthesized as one long (top strand) 73 baseoligonucleotide and one shorter (24 base) bottom strand oligonucleotide.The top strand sequence extends from position 770 (G) to position 842(G) in the sequence of TREB86. This sequence includes the codons for thepositions of variegation as well as the recognition sequences for theflanking restriction enzymes Eag I (778 to 783) and Sty I (829 to 834).The bottom strand oligonucleotide comprises the complement of thesequence from positions 819 to 842.

To generate the mutagenic cassette, the top and bottom strandoligonucleotides are annealed and the resulting duplex is completed inan extension reaction using DNA polymerase. Following digestion of the73 bp dsDNA with Eag I and Sty I, the purified 51 bp mutagenic cassetteis ligated with the large linear fragment obtained from a similardigestion of MA-ITI RF DNA. Ligation products are used to transformcompetent cells by electroporation and phage stocks produced from Ap^(r)transductants are analyzed for the presence and nature of novelsequences as described previously.

The variegation in the ITIMUT cassette is confined to the codons for thesix positions in ITI-DI (13, 15, 16, 17, 18 and 19), and employs threedifferent nucleotide mixes: N, R, and S. For this mutagenesis, thecomposition of the N-mix is 36%A, 17%C, 23%G, and 24%T, and correspondsto the N-mix composition in the optimized NNS codon described elsewhere.The R-mix composition is 50%A, 50%G, and the S-mix composition is 50%C,50%G.

The codon for ITI-DI position 13 (CCC, PRO) is changed to SNG in ITIMUT.This codon encodes the eight residues PRO, VAL, GLU, ALA, GLY, LEU, GLN,and ARG. The encoded group includes the parental residue (PRO) as wellas the more commonly observed variants at the position, ARG and LEU (seeTable 15), and also provides for the occurrence of acidic (GLU), largepolar (GLN) and nonpolar (VAL), and small (ALA, GLY) residues.

The codons for positions 15 and 17 (ATG, MET) are changed to theoptimized NNS codon. All 20 natural amino acid residues and atranslation termination are allowed.

The codon for position 16 (CGA, GLY) is changed to RNS in ITIMUT. Thiscodon encodes the twelve amino acids GLY, ALA, ASP, GLU, VAL, MET, ILE,THR, SER, ARG, ASN, and LYS. The encoded group includes the mostcommonly observed residues at this position, ALA and GLY, and providesfor the occurrence of both positively (ARG, LYS) and negatively (GLU,ASP) charged amino acids. Large nonpolar residues are also included(ILE, MET, VAL).

Finally, at positions 18 and 19, the ITI-DI sequence is changed fromACC.AGC (THR SER) to NNT.NNT. The NNT codon encodes the fifteen aminoacid residues PHE, SER, TYR, CYS, LEU, PRO, HIS, ARG, ILE, THR, ASN,VAL, ALA, ASP, and GLY. This group includes the parental residues andthe further advantages of the NNT codon have been discussed elsewhere.

The ITIMUT DNA sequence encodes a total of:

    8 * 20 * 12 * 20 * 15 * 15=8,640,000

different protein sequences in a total of:

    2.sup.25 =33,554,422

different DNA sequences. The total number of protein sequences encodedby ITIMUT is only 7.4-fold fewer than the total possible number ofnatural sequences obtained from variation at six positions (=20⁶=6.4.10⁷). However, this degree of variation in protein sequence isobtained from a minimum of 1.07×10⁹ (NNS⁶ =2³⁰) DNA sequences, a 32-foldgreater number than that comprising ITIMUT. Thus, ITIMUT is an efficientvehicle for the generation of a large and diverse population ofpotential binding proteins.

EXAMPLE XI DEVELOPMENT AND SELECTION OF BPTI MUTANTS FOR BINDING TOHORSE HEART MYOGLOBIN (HHMB)

The following example is hypothetical and illustrates alternativeembodiments of the invention not given in other examples.

HHMb is chosen as a typical protein target; any other protein could beused. HHMb satisfies all of the criteria for a target: 1) it is largeenough to be applied to an affinity matrix, 2) after attachment it isnot reactive, and 3) after attachment there is sufficient unalteredsurface to allow specific binding by PBDs.

The essential information for HHMb is known: 1) HHMb is stable at leastup to 70° C., between pH 4.4 and 9.3, 2) HHMb is stable up to 1.6 MGuanidinium C1, 3) the pI of HHMb is 7.0, 4) for HHMb, M_(r) =16,000, 5)HHMb requires haem, 6) HHMb has no proteolytic activity.

In addition, the following information about HHMb and other myoglobinsis available: 1) the sequence of HHMb is known, 2) the 3D structure ofsperm whale myoglobin is known; HHMb has 19 amino acid differences andit is generally assumed that the 3D structures are almost identical, 3)HHMb has no enzymatic activity, 4) HHMb is not toxic.

We set the specifications of an SBD as : 1) T=25° C.; 2) pH =8.0; 3)Acceptable solutes ((A) for binding: i) phosphate, as buffer, 0 to 20mM, and ii) KCl, 10 mM; (B) for column elution i) phosphate, as buffer,0 to 30 mM, ii) KCl, up to 5 M, and iii) Guanidinium C1, up to 0.8 M.);4) Acceptable K_(d) <1.0.10⁻⁸ M.

As stated in Sec. III.B, the residues to be varied are picked, in part,through the use of interactive computer graphics to visualize thestructures. In this example, all residue numbers refer to BPTI. We picka set of residues that forms a surface such that all residues cancontact one target molecule. Information that we refer to during theprocess of choosing residues to vary includes: 1) the 3D structure ofBPTI, 2) solvent accessibility of each residue as computed by the methodof Lee and Richards (LEEB71), 3) a compilation of sequences of otherproteins homologous to BPTI, and 4) knowledge of the structural natureof different amino acid types.

Tables 16 and 34 indicate which residues of BPTI: a) have substantialsurface exposure, and b) are known to tolerate other amino acids inother closely related proteins. We use interactive computer graphics topick sets of eight to twenty residues that are exposed and variable andsuch that all members of one set can touch a molecule of the targetmaterial at one time. If BPTI has a small amino acid at a given residue,that amino acid may not be able to contact the target simultaneouslywith all the other residues in the interaction set, but a larger aminoacid might well make contact. A charged amino acid might affect bindingwithout making direct contact. In such cases, the residue should beincluded in the interaction set, with a notation that larger residuesmight be useful. In a similar way, large amino acids near the geometriccenter of the interaction set may prevent residues on either side of thelarge central residue from making simultaneous contact. If a small aminoacid, however, were substituted for the large amino acid, then thesurface would become flatter and residues on either side could makesimultaneous contact. Such a residue should be included in theinteraction set with a notation that small amino acids may be useful.

Table 35 was prepared from standard model parts and shows the maximumspan between C.sub.β and the tip of each type of side group. C.sub.β isused because it is rigidly attached to the protein main-chain; rotationabout the C.sub.α -C.sub.β bond is the most important degree of freedomfor determining the location of the side group.

Table 34 indicates five surfaces that meet the given criteria. The firstsurface comprises the set of residues that actually contacts trypsin inthe complex of trypsin with BPTI as reported in the Brookhaven ProteinData Bank entry "1TPA". This set is indicated by the number "1". Theexposed surface of the residues in this set (taken from Table 16) totals1148 Å². Although this is not strictly the area of contact between BPTIand trypsin, it is approximately the same.

Other surfaces, numbered 2 to 5, were picked by first picking oneexposed, variable residue and then picking neighboring residues until asurface was defined. The choice of sets of residues shown in Table 34 isin no way exhaustive or unique; other sets of variable, surface residuescan be picked. Set #2 is shown in stereo view, FIG. 14, including the αcarbons of BPTI, the disulfide linkages, and the side groups of the set.We take the orientation of BPTI in FIG. 14 as a standard orientation andhereinafter refer to K15 as being at the top of the molecule, while thecarboxy and amino termini are at the bottom.

Solvent accessibilities are useful, easily tabulated indicators of aresidue's exposure. Solvent accessibilities must be used with somecaution; small amino acids are under-represented and large amino acidsoverrepresented. The user must consider what the solvent accessibilityof a different amino acid would be when substituted into the structureof BPTI.

To create specific binding between a derivative of BPTI and HHMb, wewill vary the residues in set #2. This set includes the twelve principalresidues 17(R), 19(I), 21(Y), 27(A), 28(G), 29(L), 31(Q), 32(T), 34(V),48(A), 49(E), and 52(M) (Sec. III.B). None of the residues in set #2 iscompletely conserved in the sample of sequences reported in Table 34;thus we can vary them with a high probability of retaining theunderlying structure. Independent substitution at each of these twelveresidues of the amino acid types observed at that residue would produceapproximately 4.4.10⁹ amino acid sequences and the same number ofsurfaces.

BPTI is a very basic protein. This property has been used in isolatingand purifying BPTI and its homologues so that the high frequency ofarginine and lysine residues may reflect bias in isolation and is notnecessarily required by the structure. Indeed, SCI-III from Bombyx moricontains seven more acidic than basic groups (SASA84).

Residue 17 is highly variable and fully exposed and can contain R, K, A,Y, H, F, L, M, T, G, Y, P, or S. All types of amino acids are seen:large, small, charged, neutral, and hydrophobic. That no acidic groupsare observed may be due to bias in the sample.

Residue 19 is also variable and fully exposed, containing P, R, I, S, K,Q, and L.

Residue 21 is not very variable, containing F or Y in 3I of 33 cases andI and W in the remaining cases. The side group of Y21 fills the spacebetween T32 and the main chain of residues 47 and 48. The OH at the tipof the Y side group projects into the solvent. Clearly one can vary thesurface by substituting Y or F so that the surface is either hydrophobicor hydrophilic in that region. It is also possible that the otheraromatic amino acid (viz. H) or the other hydrophobics (L, M, or V)might be tolerated.

Residue 27 most often contains A, but S, K, L, and T are also observed.On structural grounds, this residue will probably tolerate anyhydrophilic amino acid and perhaps any amino acid.

Residue 28 is G in BPTI. This residue is in a turn, but is not in aconformation peculiar to glycine. Six other types of amino acids havebeen observed at this residue: K, N, Q, R, H, and N. Small side groupsat this residue might not contact HHMb simultaneously with residues 17and 34. Large side groups could interact with HHMb at the same time asresidues 17 and 34. Charged side groups at this residue could affectbinding of HHMb on the surface defined by the other residues of theprincipal set. Any amino acid, except perhaps P, should be tolerated.

Residue 29 is highly variable, most often containing L. This fullyexposed position will probably tolerate almost any amino acid except,perhaps, P.

Residues 31, 32, and 34 are highly variable, exposed, and in extendedconformations; any amino acid should be tolerated.

Residues 48 and 49 are also highly variable and fully exposed, any aminoacid should be tolerated.

Residue 52 is in an o helix. Any amino acid, except perhaps P, might betolerated.

Now we consider possible variation of the secondary set (Sec. 13.1.2) ofresidues that are in the neighborhood of the principal set. Neighboringresidues that might be varied at later stages include 9(P), 11(T),15(K), 16(A), 18(I), 20(R), 22(F), 24(N), 26(K), 35(Y), 47(S), 50(D),and 53(R).

Residue 9 is highly variable, extended, and exposed. Residue 9 andresidues 48 and 49 are separated by a bulge caused by the ascendingchain from residue 31 to 34. For residue 9 and residues 48 and 49 tocontribute simultaneously to binding, either the target must have agroove into which the chain from 31 to 34 can fit, or all three residues(9, 48, and 49) must have large amino acids that effectively reduce theradius of curvature of the BPTI derivative.

Residue 11 is highly variable, extended, and exposed. Residue likeresidue 9, is slightly far from the surface defined by the principalresidues and will contribute to binding in the same circumstances.

Residue 15 is highly varied. The side group of residue 15 points awayform the face defined by set #2. Changes of charge at residue 15 couldaffect binding on the surface defined by residue set #2.

Residue 16 is varied but points away from the surface defined by theprincipal set. Changes in charge at this residue could affect binding onthe face defined by set #2.

Residue 18 is I in BPTI. This residue is in an extended conformation andis exposed. Five other amino acids have been observed at this residue:M, F, L, V, and T. Only T is hydrophilic. The side group points directlyaway from the surface defined by residue set #2. Substitution of chargedamino acids at this residue could affect binding at surface defined byresidue set #2.

Residue 20 is R in BPTI. This residue is in an extended conformation andis exposed. Four other amino acids have been observed at this residue:A, S, L, and Q. The side group points directly away from the surfacedefined by residue set #2. Alteration of the charge at this residuecould affect binding at surface defined by residue set #2.

Residue 22 is only slightly varied, being Y, F, or H in 30 of 33 cases.Nevertheless, A, N, and S have been observed at this residue. Aminoacids such as L, M, I, or Q could be tried here. Alterations at residue22 may affect the mobility of residue 21; changes in charge at residue22 could affect binding at the surface defined by residue set #2.

Residue 24 shows some variation, but probably can not interact with onemolecule of the target simultaneously with all the residues in theprincipal set. Variation in charge at this residue might have an effecton binding at the surface defined by the principal set.

Residue 26 is highly varied and exposed. Changes in charge may affectbinding at the surface defined by residue set #2; substitutions mayaffect the mobility of residue 27 that is in the principal set.

Residue 35 is most often Y, W has been observed. The side group of 35 isburied, but substitution of F or W could affect the mobility of residue34.

Residue 47 is always T or S in the sequence sample used. The O_(gamma)probably accepts a hydrogen bond from the NH of residue 50 in the alphahelix. Nevertheless, there is no overwhelming steric reason to precludeother amino acid types at this residue. In particular, other amino acidsthe side groups of which can accept hydrogen bonds, viz. N, D, Q, and E,may be acceptable here.

Residue 50 is often an acidic amino acid, but other amino acids arepossible.

Residue 53 is often R, but other amino acids have been observed at thisresidue. Changes of charge may affect binding to the amino acids ininteraction set #2.

Stereo FIG. 14 shows the residues in set #2, plus R39. From FIG. 14, onecan see that R39 is on the opposite side of BPTI form the surfacedefined by the residues in set #2. Therefore, variation at residue 39 atthe same time as variation of some residues in set #2 is much lesslikely to improve binding that occurs along surface #2 than is variationof the other residues in set #2.

In addition to the twelve principal residues and 13 secondary residues,there are two other residues, 30(C) and 33(F), involved in surface #2that we will probably not vary, at least not until late in theprocedure. These residues have their side groups buried inside BPTI andare conserved. Changing these residues does not change the surfacenearly so much as does changing residues in the principal set. Theseburied, conserved residues do, however, contribute to the surface areaof surface #2. The surface of residue set #2 is comparable to the areaof the trypsin-binding surface. Principal residues 17, 19, 21, 27, 28,29, 31, 32, 34, 48, 49, and 52 have a combined solvent-accessible areaof 946.9 A². Secondary residues 9, 11, 15, 16, 18, 20, 22, 24, 26, 35,47, 50, and 53 have combined surface of 104.7 A². Residues 30 and 33have exposed surface totaling 38.2 A2. Thus the three groups' combinedsurface is 2026.8 A².

Residue 30 is C in BPTI and is conserved in all homologous sequences. Itshould be noted, however, that C14/C38 is conserved in all naturalsequences, yet Marks et al. (MARK87) showed that changing both C14 andC38 to A,A or T,T yields a functional trypsin inhibitor. Thus it ispossible that BPTI-like molecules will fold if C30 is replaced.

Residue 33 is F in BPTI and in all homologous sequences. Visualinspection of the BPTI structure suggests that substitution of Y, M, H,or L might be tolerated.

Having identified twenty residues that define a possible bindingsurface, we must choose some to vary first. Assuming a hypotheticalaffinity separation sensitivity, C_(sensi), of 1 in 4.10⁸, we decide tovary six residues (leaving some margin for error in the actual basecomposition of variegated bases). To obtain maximal recognition, wechoose residues from the principal set that are as far apart aspossible. Table 36 shows the distances between the β carbons of residuesin the principal and peripheral set. R17 and V34 are at one end of theprincipal surface. Residues A27, G28, L29, A48, E49, and M52 are at theother end, about twenty Angstroms away; of these, we will vary residues17, 27, 29, 34, and 48. Residues 28, 49, and 52 will be varied at laterrounds.

Of the remaining principal residues, 21 is left to later variations.Among residues 19, 31, and 32, we arbitrarily pick 19 to vary.

Unlimited variation of six residues produces 6.4 10⁷ amino acidsequences. By hypothesis, C_(sensi) is 1 in 4.10⁸. Table 37 shows theprogrammed variegation at the chosen residues. The parental sequence ispresent as 1 part in 5.5.10⁷, but the least favored sequences arepresent at only 1 part in 4.2.10⁹. Among single-amino-acid substitutionsfrom the PPBD, the least favored is F17-I19-A27-L29-V34-A48 and has acalculated abundance of 1 part in 1.6.10⁸. Using the optimal qfk codon,we can recover the parental sequence and all one-amino-acidsubstitutions to the PPBD if actual nt compositions come within 5% ofprogrammed compositions. The number of transformants is M_(ntv) =1.0.10⁹(also by hypothesis), thus we will produce most of the programmedsequences.

The residue numbers of the preceding section are referred to mature BPTI(R1-P2-...-A58). Table 25 has residue numbers referring to thepre-M13CP-BPTI protein; all mature BPTI sequence numbers have beenincreased by the length of the signal sequence, i.e. 23. Thus in termsof the pre-OSP-PBD residue numbers, we wish to vary residues 40, 42, 50,52, 57, and 71. A DNA subsequence containing all these codons is foundbetween the (ApaI/DraII/PssI) sites at base 191 and the Sph I site atbase 309 of the osp-pbd gene. Among ApaI, DraI, and PssI, ApaI ispreferred because it recognizes six bases without any ambiguity. DraIIand PssI, on the other hand, recognize six bases with two-fold ambiguityat two of the bases. The vgDNA will contain more DraII and PssIrecognition sites at the varied locations than it will contain ApaIrecognition sites. The unwanted extraneous cutting of the vgDNA by ApaIand SphI will eliminate a few sequences from our population. This is aminor problem, but by using the more specific enzyme (ApaI), we minimizethe unwanted effects. The sequence shown in Table 37 illustrates anadditional way in which gratuitous restriction sites can be avoided insome cases. The osp-ipbd gene had the codon GGC for g51; because we arevarying both residue 50 and 52, it is possible to obtain an ApaI site.If we change the glycine codon to GGT, the ApaI site can no longerarise. ApaI recognizes the DNA sequence (GGGCC/C).

Each piece of dsDNA to be synthesized needs six to eight bases added ateither end to allow cutting with restriction enzymes and is shown inTable 37. The first synthetic base (before cutting with ApaI and SphI)is 184 and the last is 322. There are 142 bases to be synthesized. Thecenter of the piece to the synthesized lies between Q54 and V57. Theoverlap can not include varied bases, so we choose bases 245 to 256 asthe overlap that is 12 bases long. Note that the codon for F56 has beenchanged to TTC to increase the GC content of the overlap. The aminoacids that are being varied are marked as X with a plus over them.Codons 57 and 71 are synthesized on the sense (bottom) strand. Thedesign calls for "qfk" in the antisense strand, so that the sense strandcontains (from 5' to 3' ) a) equal part C and A (i.e. the complement ofk), b) (0.40 T, 0.22 A, 0.22 C, and 0.16 G) (i.e. the complement of f),and c) (0.26 T, 0.26 A, 0.30 C, and 0.18 G).

Each residue that is encoded by "qfk" has 21 possible outcomes, each ofthe amino acids plus stop. Table 12 gives the distribution of aminoacids encoded by "qfk", assuming 5% errors. The abundance of theparental sequence is the product of the abundances of R×I×A×L×V×A. Theabundance of the least-favored sequence is 1 in 4.2 10⁹.

Olig#27 and olig#28 are annealed and extended with Klenow fragment andall four (nt)TPs. Both the ds synthetic DNA and RF pLG7 DNA are cut withboth ApaI and SphI. The cut DNA is purified and the appropriate piecesligated (See Sec. 14.1) and used to transform competent PE383. (Sec.14.2). In order to generate a sufficient number of transformants, V_(c)is set to 5000 ml.

1) culture E. coli in 5.0 1 of LB broth at 37° C. until cell densityreaches 5.10⁷ to 7.10⁷ cells/ml,

2) chill on ice for 65 minutes, centrifuge the cell suspension at 4000gfor 5 minutes at 4° C.,

3) discard supernatant; resuspend the cells in 1667 ml of an ice-cold,sterile solution of 60 mM CaCl₂,

4) chill on ice for 15 minutes, and then centrifuge at 4000g for 5minutes at 4° C.,

5) discard supernatant; resuspend cells in 2×400 ml of ice-cold, sterile60 mM CaCl₂ ; store cells at 4° C. for 24 hours,

6) add DNA in ligation or TE buffer; mix and store on ice for 30minutes; 20 ml of solution containing 5 μg/ml of DNA is used,

7) heat shock cells at 42° C. for 90 seconds,

8) add 200 ml LB broth and incubate at 37° C. for 1 hours,

9) add the culture to 2.0 1 of LB broth containing ampicillin at 35-100μg/ml and culture for 2 hours at 37° C.,

10) centrifuge at 8000 g for 20 minutes at 4° C.,

11) discard supernatant, resuspend cells in 50 ml of LB broth plusampicillin and incubate 1 hour at 37° C.,

12) plate cells on LB agar containing ampicillin,

13) harvest virions by method of Salivar et al. (SALI64).

The heat shock of step (7) can be done by dividing the 200 ml into 100200 μl aliquots in 1.5 ml plastic Eppendorf tubes. It is possible tooptimize the heat shock for other volumes and kinds of container. It isimportant to: a) use all or nearly all the vgDNA synthesized inligation, this will require large amounts of pLG7 backbone, b) use allor nearly all the ligation mixture to transform cells, and c) cultureall or nearly all the transformants at high density. These measures aredirected at maintaining diversity.

IPTG is added to the growth medium at 2.0 mM (the optimal level) andvirions are harvested int he usual way. It is important to collectvirions in a way that samples all or nearly all the transformants.Because F⁻ cells are used in the transformation, multiple infections donot pose a problem.

HHMb has a pI of 7.0 and we carry out chromatography at pH 8.0 so thatHHMb is slightly negative while BPTI and most of its mutants arepositive. HHMb is fixed (Sec. V.F) to a 2.0 ml column on Affi-Gel 10™ orAffi-Gel 15™ at 4.0 mg/ml support matrix, the same density that isoptimal for a column supporting trp.

We note that charge repulsion between BPTI and HHMb should not be aserious problem and does not impose any constraints on ions or solutesallowed as eluants. Neither BPTI nor HHMb have special requirements thatconstrain choice of eluants. The eluant of choice is KCl in varyingconcentrations.

To remove variants of BPTI with strong, indiscriminate binding for anyprotein or for the support matrix, we pass the variegated population ofvirions over a column that supports bovine serum albumin (BSA) beforeloading the population onto the {HHMb} column. Affi-Gel 10™ or Affi-Gel15™ is used to immobilize BSA at the highest level the matrix willsupport. A 10.0 ml column is loaded with 5.0 ml of Affi-Gel-linked-BSA;this column, called {BSA}, has V_(V) =5.0 ml. The variegated populationof virions containing 10¹² pfu in 1 ml (0.2×V_(V)) of 10 mM KCl, 1 mMphosphate, pH 8.0 buffer is applied to {BSA}. We wash {BSA} with 4.5 ml(0.9×V_(V)) of 50 mM KCl, 1 mM phosphate, pH 8.0 buffer. The wash with50 mM salt will elute virions that adhere slightly to BSA but notvirions with strong binding. The pooled effluent of the {BSA} column is5.5 ml of approximately 13 mM KCl.

The column {HHMb} is first blocked by treatment with 10¹¹ virions ofM13(am429) in 100 ul of 10 mM KCl buffered to pH 8.0 with phosphate; thecolumn is washed with the same buffer until OD₂₆₀ returns to base lineor 2×V_(V) have passed through the column, whichever comes first. Thepooled effluent from {BSA} is added to {HHMb} in 5.5 ml of 13 mM KCl, 1mM phosphate, pH 8.0 buffer. The column is eluted in the following way:

1) 10 mM KCl buffered to pH 8.0 with phosphate, until optical density at280nm falls to base line or 2× V_(V), whichever is first, (effluentdiscarded),

2) a gradient of 10 mM to 2 M KCl in 3×V_(V), pH held at 8.0 withphosphate, (30.100 μl fractions),

3) a gradient of 2 M to 5 M KCl in 3×V_(V), phosphate buffer to pH 8.0(30.100 μl fractions),

4) constant 5 M KCl plus 0 to 0.8 M guanidinium Cl in 2×V_(V), withphosphate buffer to pH 8.0, (20.100 μl fractions), and

5) constant 5 M KCl plus 0.8 M guanidinium Cl in 1×V_(V), with phosphatebuffer to pH 8.0, (10.100 μl fractions).

In addition to the elution fractions, a sample is removed from thecolumn and used as an inoculum for phage-sensitive Sup⁻ cells (Sec. V).A sample of 4 μl from each fraction is plated on phage-sensitive Sup⁻cells. Fractions that yield too many colonies to count are replated atlower dilution. An approximate titre of each fraction is calculated.Starting with the last fraction and working toward the first fractionthat was titered, we pool fractions until approximately 10⁹ phage are inthe pool, i.e. about 1 part in 1000 of the phage applied to the column.This population is infected into 3.10¹¹ phage-sensitive PE384 in 300 mlof LB broth. The very low multiplicity of infection (moi) is chosen toreduce the possibility of multiple infection. After thirty minutes,viable phage have entered recipient cells but have not yet begun toproduce new phage. Phage-born genes are expressed at this phase, and wecan add ampicillin that will kill uninfected cells. These cells stillcarry F-pili and will absorb phage helping to prevent multipleinfections.

If multiple infection should pose a problem that cannot be solved bygrowth at low multiple-of-infection on F⁺ cells, the following procedurecan be employed to obviate the problem. Virions obtained from theaffinity separation are infected into F⁺ E. coli and cultured to amplifythe genetic messages (Sec. V). CCC DNA is obtained either by harvestingRF DNA or by in vitro extension of primers annealed to ss phage DNA. TheCCC DNA is used to transform F⁻ cells at a high ratio of cells to DNA.Individual virions obtained in this way should bear only proteinsencoded by the DNA within.

The phagemid population is grown and chromatographed three times andthen examined for SBDs (Sec. V). In each separation cycle, phage fromthe last three fractions that contain viable phage are pooled with phageobtained by removing some of the support matrix as an inoculum. At eachcycle, about 10¹² phage are loaded onto the column and about 10⁹ phageare cultured for the next separation cycle. After the third separationcycle, SBD colonies are picked from the last fraction that containedviable phage.

Each of the SBDs is cultured and tested for retention on a Pep-Tiecolumn supporting HHMb. The phage showing the greatest retention on thePep-Tie {HHMb} column. This SBD! becomes the parental amino-acidsequence to the second variegation cycle.

Assume for the sake of argument that, in SBD!, R40 changed to D, I42changed to Q, A50 changed to E, L52 remained L, and A71 changed to W(see Table 38). If so, a rational plan for the second round ofvariegation would be that which is set forth in Table 39. The residuesto be varied are chosen by: a) choosing some of the residues in theprincipal set that were not varied in the first round (viz. residues 42,44, 51, 54, 55, 72, or 75 of the fusion), and b) choosing some residuesin the secondary set. Residues 51, 54, 55, and 72 are varied through alltwenty amino acids and, unavoidably, stop. Residue 44 is only variedbetween Y and F. Some residues in the secondary set are varied through arestricted range; primarily to allow different charges (+, 0, -) toappear. Residue 38 is varied through K, R, E, or G. Residue 41 is variedthrough I, V, K, or E. Residue 43 is varied through R, S, G, N, K, D, E,T, or A.

Now assume that in the most successful SBD of the second round ofvariegation (SBD-2!), residue 38 K15 of BPTI) changed to E, 41 becomesV, 43 goes to N, 44 goes to F, 51 goes to F, 54 goes to S, 55 goes to A,and 72 goes to Q (see Table 40). A third round of variation isillustrated in Table 41; eight amino acids are varied. Those in theprincipal set, residues 40, 55, and 57, are varied through all twentyamino acids. Residue 32 is varied through P, Q, T, K, A, or E. Residue34 is varied through T, P, Q, K, A, or E. Residue 44 is varied throughF, L, Y, C, W, or stop. Residue 50 is varied through E, K, or Q. Residue52 is varied through L, F, I, M, or V. The result of this variation isshown in Table 42.

This example is hypothetical. It is anticipated that more variegationcycles will be needed to achieve dissociation constants of 10⁻⁸ M. It isalso possible that more than three separation cycles will be needed insome variegation cycles. Real DNA chemistry and DNA synthesizers mayhave larger errors than our hypothetical 5%. If S_(err) >0.05, then wemay not be able to vary six residues at once. Variation of 5 residues atonce is certainly possible.

EXAMPLE XII DESIGN AND MUTAGENESIS OF A CLASS 1 MINI-PROTEIN

To obtain a library of binding domains that are conformationallyconstrained by a single disulfide, we insert DNA coding for thefollowing family of mini-proteins into the gene coding for a suitableOSP. ##STR30## Where indicates disulfide bonding; this mini-protein isdepicted in FIG. 3. Disulfides normally do not form between cysteinesthat are consecutive on the polypeptide chain. One or more of theresidues indicated above as X_(n) will be varied extensively to obtainnovel binding. There may be one or more amino acids that precede X₁ orfollow X8, however, these additional residues will not be significantlyconstrained by the diagrammed disulfide bridge, and it is lessadvantageous to vary these remote, unbridged residues. The last Xresidue is connected to the OSP of the genetic package.

X₁, X₂, X₃, X₄, X₅, X₆, X₇, and X₈ can be varied independently; i.e. adifferent scheme of variegation could be used at each position. X₁ andX₈ are the least constrained residues and may be varied less than otherpositions.

X₁ and X₈ can be, for example, one of the amino acids [E, K, T, and A];this set of amino acids is preferred because: a) the possibility ofpositively charged, negatively charged, and neutral amino acids isprovided, b) these amino acids can be provided in 1:1:1:1 ratio via thecodon RMG (R=equimolar A and G, M =equimolar A and C), and c) theseamino acids allow proper processing by signal peptidases.

One option for variegation of X₂, X₃, X₄, X₅, X₆, and X₇ is to vary allof these in the same way. For example, each of X₂, X₃, X₄, X₅, X₆, andX₇ can be chosen from the set [F, S, Y, C, L, P, H, R, I, T, N, V, A, D,and G] which is encoded by the mixed codon NNT. Tables 10 and 130compares libraries in which six codons have been varied either by NNT orNNK codons. NNT encodes 15 different amino acids and only 16 DNAsequences. Thus, there are 1.139.10⁷ amino-acid sequences, no stops, andonly 1.678.10⁷ DNA sequences. A library of 10⁸ independent transformantswill contain 99% of all possible sequences. The NNK library contains6.4.10⁷ sequences, but complete sampling requires a much larger numberof independent transformants.

EXAMPLE XIII A CYS::HELIX::TURN::STRAND::CYS UNIT

The parental Class 2 mini-proteins may be a naturally-occurring Class 2mini-protein. It may also be a domain of a larger protein whosestructure satisfies or may be modified so as to satisfy the criteria ofa class 2 mini-protein. The modification may be a simple one, such asthe introduction of a cysteine (or a pair of cysteines) into the base ofa hairpin structure so that the hairpin may be closed off with adisulfide bond, or a more elaborate one, so as the modification ofintermediate residues so as to achieve the hairpin structure. Theparental class 2 mini-protein may also be a composite of structures fromtwo or more naturally-occurring proteins, e.g., an α helix of oneprotein and a β strand of a second protein.

One mini-protein motif of potential use comprises a disulfide loopenclosing a helix, a turn, and a return strand. Such a structure couldbe designed or it could be obtained from a protein of known 3Dstructure. Scorpion neurotoxin, variant 3, (ALMA83a, ALMA83b) (hereafterScorpTx) contains a structure diagrammed in FIG. 15 that comprises ahelix (residues N22 through N33), a turn (residues 33 through 35), and areturn strand (residues 36 through 41). ScorpTx contains disulfides thatjoin residues 12-65, 16-41, 25-46, and 29-48. CYS₂₅ and CYS₄₁ are quiteclose and could be joined by a disulfide without deranging the mainchain. FIG. 15 shows CYS₂₅ joined to CYS₄₁. In addition, CYS₂₉ has beenchanged to GLN. It is expected that a disulfide will form between 25 and41 and that the helix shown will form; we know that the amino-acidsequence shown is highly compatible with this structure. The presence ofGLY₃₅, GLY₃₆, and GLY₃₉ give the turn and extended strand sufficientflexibility to accommodate any changes needed around CYS₄₁ to form thedisulfide.

From examination of this structure (as found in entry 1SN3 of theBrookhaven Protein Data Bank), we see that the following sets ofresidues would be preferred for variegation:

    ______________________________________                                        SET 1                                                                         Residue                                                                              Codon     Allowed amino acids                                                                            Naa/Ndna                                    ______________________________________                                        1) T.sub.27                                                                          NNG       L.sup.2 R.sup.2 MVSPTAQKEWG.                                                                   13/15                                       2) E.sub.28                                                                          VHG       LMVPTAGKE        9/9                                         3) A.sub.31                                                                          VHG       LMVPTAGKE        9/9                                         4) K.sub.32                                                                          VHG       LMVPTAGKE        9/9                                         5) G24 NNG       L.sup.2 r.sup.2 MVSPTAQKEWG.                                                                   13/15                                       6) E23 VHG       LMVPTAGKE        9/9                                         7) Q34 VAS       HQNKED           6/6                                         ______________________________________                                         Note:                                                                         Exponents on amino acids indicate multiplicity of codons.                

Positions 27, 28, 31, 32, 24, and 23 comprise one face of the helix. Ateach of these locations we have picked a variegating codon that a)includes the parental amino acid, b) includes a set of residues having apredominance of helix favoring residues, c) provides for a wide varietyof amino acids, and d) leads to as even a distribution as possible.Position 34 is part of a turn. The side group of residue 34 couldinteract with molecules that contact the side groups of residues 27, 28,31, 32, 24, and 23. Thus we allow variegation here and provide aminoacids that are compatible with turns. The variegation shown leads to6.65.10⁶ amino acid sequences encoded by 8.85.10⁶ DNA sequences.

    ______________________________________                                        SET 2                                                                         Residue                                                                              Codon     Allowed amino acids                                                                            Naa/Ndna                                    ______________________________________                                        1) D.sub.26                                                                          VHS       L.sup.2 IMV.sup.2 P.sup.2 T.sup.2 A.sup.2 HQNKDE                                               13/18                                       2) T.sub.27                                                                          NNG       L.sup.2 R.sup.2 MVSPTAQKEWG.                                                                   13/15                                       3) K.sub.30                                                                          VHG       KEQPTALMV        9/9                                         4) A.sub.31                                                                          VHG       KEQPTALMV        9/9                                         5) K.sub.32                                                                          VHG       LMVPTAGKE        9/9                                         6) S.sub.37                                                                          RRT       SNDG             4/4                                         7) Y.sub.38                                                                          NHT       YSFHPLNTIDAV     9/9                                         ______________________________________                                    

Positions 26, 27, 30, 31, and 32 are variegated so as to enhancehelix-favoring amino acids in the population. Residues 37 and 38 are inthe return strand so that we pick different variegation codons. Thisvariegation allows 4.43.10⁶ amino-acid sequences and 7.08.10⁶ DNAsequences. Thus a library that embodies this scheme can be sampled veryefficiently.

EXAMPLE XIV DESIGN AND MUTAGENESIS OF CLASS 3 MINI-PROTEIN Two DisulfideBond Parental Mini-Proteins

Mini-proteins with two disulfide bonds may be modelled after theα-conotoxins, e.g., GI, GIA, GII, MI, and SI. These have the followingconserved structure: ##STR31##

Hashimoto et al. (HASH85) reported synthesis of twenty-four analogues ofconotoxins GI, GII, and MI. Using the numbering scheme for GI (CYS atpositions 2, 3, 7, and 13), Hashimoto et al. reported alterations at 4,8, 10, and 12 that allows the proteins to be toxic. Almquist et al.(ALMQ89) synthesized [des-GLU₁ ] Conotoxin GI and twenty analogues. Theyfound that substituting GLY for PRO₅ gave rise to two isomers, perhapsrelated to different disulfide bonding. They found a number ofsubstitutions at residues 8 through 11 that allowed the protein to betoxic. Zafaralla et al. (ZAFA88) found that substituting PRO at position9 gives an active protein. Each of the groups cited used only in vivotoxicity as an assay for the activity. From such studies, one can inferthat an active protein has the parental 3D structure, but one can notinfer that an inactive protein lacks the parental 3D structure.

Pardi et al. (PARD89) determined the 3D structure of α Conotoxin GIobtained from venom by NMR. Kobayashi et al. (KOBA89) have reported a 3Dstructure of synthetic α Conotoxin GI from NMR data which agrees withthat of PARD89. We refer to FIG. 5 of Pardi et al..

Residue GLU₁ is known to accomodate GLU, ARG, and ILE in known analoguesor homologues. A preferred variegation codon is NNG that allows the setof amino acids [L² R² MVSPTAQKEWG<stop>]. From FIG. 5 of Pardi et al. wesee that the side group of GLU₁ projects into the same region as thestrand comprising residues 9 through 12. Residues 2 and 3 are cysteinesand are not to be varied. The side group of residue 4 points away fromresidues 9 through 12; thus we defer varying this residue until a laterround. PRO₅ may be needed to cause the correct disulfides to form; whenGLY was substituted here the peptide folded into two forms, neither ofwhich is toxic. It is allowed to vary PRO₅, but not perferred in thefirst round.

No substitutions at ALA₆ have been reported. A preferred variegationcodon is RMG which gives rise to ALA, THR, LYS, and GLU (smallhydrophobic, small hydrophilic, positive, and negative). CYS₇ is notvaried. We prefer to leave GLY₈ as is, although a homologous proteinhaving ALA₈ is toxic. Homologous proteins having various amino acids atposition 9 are toxic; thus, we use an NNT variegation codon which allowsFS² YCLPHRITNVADG. We use NNT at positions 10, 11, and 12 as well. Atposition 14, following the fourth CYS, we allow ALA, THR, LYS, or GLU(via an RMG codon). This variegation allows 1.053·10⁷ anino-acidsequences, encoded by 1.68·10⁷ DNA sequences. Libraries having 2.0·10⁷,3.0·10⁷, and 5.0·10⁷ independent transformants will, respectively,display ≈70%, ≈83%, and ≈95% of the allowed sequences. Othervariegations are also appropriate. Concerning α conotoxins, see, interalia, ALMQ89, CRUZ85, GRAY83, GRAY84, and PARD89.

The parental mini-protein may instead be one of the proteins designated"Hybrid-I" and "Hybrid-II" by Pease et al. (PEAS90); cf. FIG. 4 ofPEAS90. One preferred set of residues to vary for either proteinconsists of:

    ______________________________________                                        Parental                                                                              Variegated                                                                              Allowed          AA seqs/                                   Amino acid                                                                            Codon     Amino acids      DNA seqs                                   ______________________________________                                        A5      RVT       ADGTNS           6/6                                        P6      VYT       PTALIV           6/6                                        E7      RRS       EDNKSRG.sup.2    7/8                                        T8      VHG       TPALMVQKE        9/9                                        A9      VHG       ATPLMVQKE        9/9                                        A10     RMG       AEKT             4/4                                        K12     VHG       KQETPALMV        9/9                                        Q16     NNG       L.sup.2 R.sup.2 S.WPQMTKVAEG                                                                   13/15                                      ______________________________________                                         (RVT.VYT.RRS.VHG.VHG.RMG has SEQ ID NO: 106)                             

(RVT.VYT.RRS.VHG.VHG.RMG has SEQ ID NO:106) This provides 9.55·10⁶amino-acid sequences encoded by 1.26·10⁷ DNA sequences. A librarycomprising 5.0·10⁷ transformants allows expression of 98.2% of allpossible sequences. At each position, the parental amino acid isallowed.

At position 5 we provide amino acids that are compatible with a turn. Atposition 6 we allow ILE and VAL because they have branched β carbons andmake the chain ridged. At position 7 we allow ASP, ASN, and SER thatoften appear at the amino termini of helices. At positions 8 and 9 weallow several helix-favoring amino acids (ALA, LEU, MET, GLN, GLU, andLYS) that have differing charges and hydrophobicities because these arepart of the helix proper. Position 10 is further around the edge of thehelix, so we allow a smaller set (ALA, THR, LYS, and GLU). This set notonly includes 3 helix-favoring amino acids plus THR that is welltolerated but also allows positive, negative, and neutral hydrophilic.The side groups of 12 and 16 project into the same region as theresidues already recited. At these positions we allow a wide variety ofamino acids with a bias toward helix-favoring amino acids.

The parental mini-protein may instead be a polypeptide composed ofresidues 9-24 and 31-40 of aprotinin and possessing two disulfides(Cys9-Cys22 and Cys14-Cys38). Such a polypeptide would have the samedisulfide bond topology as α-conotoxin, and its two bridges would havespans of 12 and 17, respectively.

Residues 23, 24 and 31 are variegated to encode the amino acid residueset [G,S,R,D,N,H,P,T,A] so that a sequence that favors a turn of thenecessary geometry is found. We use trypsin or anhydrotrypsin as theaffinity molucule to enrich for GPs that display a mini-protein thatfolds into a stable structure similar to BPTI in the P1 region.

Three Disulfide Bond Parental Mini-Proteins

The cone snails (Conus) produce venoms (conotoxins) which are 10-30amino acids in length and exceptionally rich in disulfide bonds. Theyare therefore archetypal mini-proteins. Novel mini-proteins with threedisulfide bonds may be modelled after the μ-(GIIIA, GIIIB, GIIIC) orΩ-(GVIA, GVIB, GVIC, GVIIA, GVIIB, MVIIA, MVIIB, etc.) conotoxins. Theμ-conotoxins have the following conserved structure: ##STR32##

No 3D structure of a μ-conotoxin has been published. Hidaka et al.(HIDA90) have established the connectivity of the disulfides. Thefollowing diagram depicts geographutoxin I (also known as μ-conotoxinGIIIA). ##STR33## The connection from R19 to C20 could go over or underthe strand from Q14 to C15. One preferred form of variegation is to varythe residues in one loop. Because the longest loop contains only fiveamino acids, it is appropriate to also vary the residues connected tothe cysteines that form the loop. For example, we might vary residues 5through 9 plus 2, 11, 19, and 22. Another useful variegation would be tovary residues 11-14 and 16-19, through eight amino acids. Concerning μconotoxins, see BECK89b, BECK89c, CRUZ89, and HIDA90.

The Ω-conotoxins may be represented as follows: ##STR34## The King Kongpeptide has the same disulfide arrangement as the Ω-conotoxins but adifferent biological activity. Woodward et al. (WOOD90) report thesequences of three homologuous proteins from C. textile. Within themature toxin domain, only the cysteines are conserved. The spacing ofthe cysteines is exactly conserved, but no other position has the sameamino acid in all three sequences and only a few positions show evenpair-wise matches. Thus we conclude that all positions (except thecysteines) may be substituted freely with a high probability that astable disulfide structure will form. Concerning Ω conotoxins, seeHILL89 and SUNX87.

Another mini-protein which may be used as a parental binding domain isthe Cucurbita maxima trypsin inhibitor I (CMTI-I); CMTI-III is alsoappropriate. They are members of the squash family of serine proteaseinhibitors, which also includes inhibitors from summer squash, zucchini,and cucumbers (WIEC85). McWherter et al. (MCWH89) describe syntheticsequence-variants of the squash-seed protease inhibitors that haveaffinity for human leukocyte elastase and cathepsin G. Of course, anymember of this family might be used.

CMTI-I is one of the smallest proteins known, comprising only 29 aminoacids held in a fixed comformation by three disulfide bonds. Thestructure has been studied by Bode and colleagues using both X-raydiffraction (BODE89) and NMR (HOLA89a,b). CMTI-I is of ellipsoidalshape; it lacks helices or β-sheets, but consists of turns andconnecting short polypeptide stretches. The disulfide pairing isCys3-Cys20, Cys10-Cys22 and Cys16-Cys28. In the CMTI-I:trypsin complexstudied by Bode et al., 13 of the 29 inhibitor residues are in directcontact with trypsin; most of them are in the primary binding segmentVa12(P4)-Glu9 (P4') which contains the reactive site bond Arg5(P1)-Ile6and is in a conformation observed also for other serine proteinaseinhibitors.

CMTI-I has a K_(i) for trypsin of ≈1.5·10⁻¹² M. McWherter et al.suggested substitution of "moderately bulky hydrophobic groups" at P1 toconfer HLE specificity. They found that a wider set of residues (VAL,ILE, LEU, ALA, PHE, MET, and GLY) gave detectable binding to HLE. Forcathepsin G, they expected bulky (especially aromatic) side groups to bestrongly preferred. They found that PHE, LEU, MET, and ALA werefunctional by their criteria; they did not test TRP, TYR, or HIS. (Notethat ALA has the second smallest side group available.)

A preferred initial variegation strategy would be to vary some or all ofthe residues ARG₁, VAL₂, PRO₄, ARG₅, ILE₆, LEU₇, MET₈, GLU₉, LYS₁₁,HIS₂₅, GLY₂₆, TYR₂₇, and GLY₂₉. If the target were HNE, for example, onecould synthesize DNA embodying the following possibilities:

    ______________________________________                                                 vg        Allowed       #AA seqs/                                    Parental Codon     amino acids   #DNA seqs                                    ______________________________________                                        ARG.sub.1                                                                              VNT       RSLPHITNVADG  12/12                                        VAL.sub.2                                                                              NWT       VILFYHND      8/8                                          PRO.sub.4                                                                              VYT       PLTIAV        6/6                                          ARG.sub.5                                                                              VNT       RSLPHITNVADG  12/12                                        ILE6     NNK       all 20        20/31                                        LEU.sub.7                                                                              VWG       LQMKVE        6/6                                          TYR.sub.27                                                                             NAS       YHQNKDE.      7/8                                          ______________________________________                                         (VYT.VNT.NNK.VWG has SEQ ID NO: 107)   (VYT.VNT.NNK.VWG has SEQ ID NO:        107) This allows about 5.81·10.sup.6 amino-acid sequences encoded     by about 1.03·10.sup.7 DNA sequences. A library comprising     5.0·10.sup.7 independent transformants would give ≈99% of     the possible sequences. Other variegation schemes could also be used.

Other inhibitors of this family include: Trypsin inhibitor I fromCitrullus vulgaris (OTLE87), Trypsin inhibitor II from Bryonia dioica(OTLE87), Trypsin inhibitor I from Cucurbita maxima (in OTLE87), trypsininhibitor III from Cucurbita maxima (in OTLE87), trypsin inhibitor IVfrom Cucurbita maxima (in OTLE87), trypsin inhibitor II from Cucurbitapepo (in OTLE87), trypsin inhibitor III from Cucurbita pepo (in OTLE87),trypsin inhibitor IIb from Cucumis sativus (in OTLE87), trypsininhibitor IV from Cucumis sativus (in OTLE87), trypsin inhibitor II fromEcballium elaterium (FAVE89), and inhibitor CM-1 from Momordica repens(in OTLE87).

Another mini-protein that may be used as an initial potential bindingdomain is the heat-stable enterotoxins derived from some enterotoxogenicE. coli. Citrobacter freundii, and other bacteria (GUAR89). Thesemini-proteins are known to be secreted from E. coli and are extremelystable. Works related to synthesis, cloning, expression and propertiesof these proteins include: BHAT86, SEKI85, SHIM87, TAKA85, TAKE90,THOM85a,b, YOSH85, DALL90, DWAR89, GARI87, GUZM89, GUZM90, HOUG84,KUB089, KUPE90, OKAM87, OKAM88, and OKAM90.

Another preferred IPBD is crambin or one of its homologues, thephoratoxins and ligatoxins (LEC087). These proteins are secreted inplants. The 3D structure of crambin has been determined. NMR data onhomologues indicate that the 3D structure is conserved. Residues thoughtto be on the surface of crambin, phoratoxin, or ligatoxin are preferredresidues to vary.

EXAMPLE XV A MINI-PROTEIN HAVING A CROSS-LINK CONSISTING OF CU(II), ONECYSTEINE, TWO HISTIDINES, AND ONE METHIONINE

Sequences such as

    HIS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-CYS

    and

    CYS-ASN-GLY-MET-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-HIS

are likely to combine with Cu(II) to form structures as shown in thediagram: ##STR35## Other arrangements of HIS, MET, HIS, and CYS alongthe chain are also likely to form similar structures. The amino acidsASN-GLY at positions 2 and 3 and at positions 12 and 13 give the aminoacids that carry the metal-binding ligands enough flexibility for themto come together and bind the metal. Other connecting sequences may beused, e.g. GLY-ASN, SER-GLY, GLY-PRO, GLY-PRO-GLY, or PRO-GLY-ASN couldbe used. It is also possible to vary one or more residues in the loopsthat join the first and second or the third and fourth metal-bindingresidues. For example, ##STR36## is likely to form the diagrammedstructure for a wide variety of amino acids at Xaa4. It is expected thatthe side groups of Xaa4 and Xaa6 will be close together and on thesurface of the mini-protein.

The variable amino acids are held so that they have limited flexibility.This cross-linkage has some differences from the disulfide linkage. Theseparation between C.sub.α4 and C.sub.α11 is greater than the separationof the C.sub.α s of a cystine. In addition, the interaction of residues1 through 4 and 11 through 14 with the metal ion are expected to limitthe motion of residues 5 through 10 more than a disulfide betweenrsidues 4 and 11. A single disulfide bond exerts strong distanceconstrains on the α carbons of the joined residues, but very littledirectional constraint on, for example, the vector from N to C in themain-chain.

For the desired sequence, the side groups of residues 5 through 10 canform specific interactions with the target. Other numbers of variableamino acids, for example, 4, 5, 7, or 3, are appropriate. Larger spansmay be used when the enclosed sequence contains segments having a highpotential to form α helices or other secondary structure that limits theconformational freedom of the polypeptide main chain. Whereas amini-protein having four CYSs could form three distinct pairings, amini-protein having two HISs, one MET, and one CYS can form only twodistinct complexes with Cu. These two structures are related by mirrorsymmetry through the Cu. Because the two HISs are distinguishable, thestructures are different.

When such metal-containing mini-proteins are displayed on filamentousphage, the cells that produce the phage can be grown in the presence ofthe appropriate metal ion, or the phage can be exposed to the metal onlyafter they are separated from the cells.

EXAMPLE XVI A MINI-PROTEIN HAVING A CROSS-LINK CONSISTING OF ZN(II) ANDFOUR CYSTEINES

A cross link similar to the one shown in Example XV is exemplified bythe Zinc-finger proteins (GIBS88, GAUS87, PARR88, FRAN87, CHOW87,HARD90). One family of Zinc-fingers has two CYS and two HIS residues inconserved positions that bind Zn₊₊ (PARR88, FRAN87, CHOW87, EVAN88,BERG88, CHAV88). Gibson et al. (GIBS88) review a number of sequencesthought to form zinc-fingers and propose a three-dimensional model forthese compounds. Most of these sequences have two CYS and two HISresidues in conserved positions, but some have three CYS and one HISresidue. Gauss et al. (GAUS87) also report a zinc-finger protein havingthree CYS and one HIS residues that bind zinc. Hard et al. (HARD90)report the 3D structure of a protein that comprises two zinc-fingers,each of which has four CYS residues. All of these zinc-binding proteinsare stable in the reducing intracellular environment.

One preferred example of a CYS::zinc cross linked mini-protein comprisesresidues 440 to 461 of the sequence shown in FIG. 1 of HARD90. Theresiudes 444 through 456 may be variegated. One such variegation is asfollows:

    ______________________________________                                        Parental                                                                             Allowed               #AA/#DNA                                         ______________________________________                                        SER444 SER, ALA              2/2                                              ASP445 ASP, ASN, GLU, LYS    4/4                                              GLU446 GLU, LYS, GLN         3/3                                              ALA447 ALA, THR, GLY, SER    4/4                                              SER448 SER, ALA              2/2                                              GLY449 GLY, SER, ASN, ASP    4/4                                              CYS450 CYS, PHE, ARG, LEU    4/4                                              HIS451 HIS, GLN, ASN, LYS, ASP, GLU                                                                        6/6                                              TYR452 TYR, PHE, HIS, LEU    4/4                                              GLY453 GLY, SER, ASN, ASP    4/4                                              VAL454 VAL, ALA, ASP, GLY, SER, ASN,                                                                       8/8                                                     THR, ILE                                                               LEU455 LEU, HIS, ASP, VAL    4/4                                              THR456 THR, ILE, ASN, SER    4/4                                              ______________________________________                                         (Ser444-Thr456 has SEQ ID NO: 43)                                        

(Ser444-Thr456 has SEQ ID NO:43) This leads to 3.77·10⁷ DNA sequencesthat encode the same number of amino-acid sequences. A library having1.0·10⁸ indepentent transformants will display 93% of the allowedsequences; 2.0·10⁸ independent transformants will display 99.5% ofallowed sequences.

                  TABLE 1                                                         ______________________________________                                        Single-letter codes.                                                          ______________________________________                                        Single-letter code is used for proteins:                                      a = ALA c = CYS   d = ASP    e = GLU f = PHE                                  g = GLY h = HIS   i = ILE    k = LYS l = LEU                                  m = MET n = ASN   p = PRO    g = GLN r = ARG                                  s = SER t = THR   v = VAL    w = TRP y = TYR                                  . = STOP                                                                              * = any amino acid                                                    b = n or d                                                                    z = e or q                                                                    x = any amino acid                                                            Single-letter IUB codes for DNA:                                              T, C, A, G stand for themselves                                               M for A or C                                                                  R for puRines A or G                                                          W for A or T                                                                  S for C or G                                                                  Y for pYrimidines T or C                                                      K for G or T                                                                  V for A, C, or G (not T)                                                      H for A, C, or T (not G)                                                      D for A, G, or T (not C)                                                      B for C, G, or T (not A)                                                      N for any base.                                                               ______________________________________                                    

                  TABLE 2                                                         ______________________________________                                        Preferred Outer-Surface Proteins                                                      Preferred                                                             Genetic Outer-Surface                                                         Package Protein         Reason for preference                                 ______________________________________                                        M13     coat protein a)     exposed amino terminus,                                   (gpVIII)     b)     predictable post-                                                             translational                                                                 processing,                                                            c)     numerous copies in                                                            virion.                                                                d)     fusion data available                                     gp III       a)     fusion data available.                                                 b)     amino terminus exposed.                                                c)     working example                                                               available.                                        PhiX174 G protein    a)     known to be on virion                                                         exterior,                                                              b)     small enough that                                                             the G-ipbd gene can                                                           replace H gene.                                   E. coli LamB         a)     fusion data available,                                                 b)     non-essential.                                            OmpC         a)     topological model                                                      b)     non-essential; abundant                                   OmpA         a)     topological model                                                      b)     non-essential; abundant                                                c)     homologues in other                                                           genera                                                    OmpF         a)     topological model                                                      b)     non-essential; abundant                                   PhoE         a)     topological model                                                      b)     non-essential; abundant                                                c)     inducible                                         B. subtilis                                                                           CotC         a)     no post-translational                             spores                      processing,                                                            b)     distinctive sequence                                                          that causes protein to                                                        localize in spore coat,                                                c)     non-essential.                                    CotD             Same as for CotC.                                            ______________________________________                                    

                  TABLE 3                                                         ______________________________________                                        Ambiguous DNA for AA.sub.-- seq2                                              ______________________________________                                         ##STR37##                                                                     ##STR38##                                                                     ##STR39##                                                                     ##STR40##                                                                     ##STR41##                                                                     ##STR42##                                                                     ##STR43##                                                                     ##STR44##                                                                     ##STR45##                                                                     ##STR46##                                                                     ##STR47##                                                                     ##STR48##                                                                     ##STR49##                                                                     ##STR50##                                                                     ##STR51##                                                                     ##STR52##                                                                     ##STR53##                                                                    ______________________________________                                    

                  TABLE 4                                                         ______________________________________                                        Table of Restriction Enzyme Suppliers                                                Suppliers:                                                             ______________________________________                                               Sigma Chemical Co.                                                            P.O. Box 14508                                                                St. Louis, Mo. 63178                                                          Bethesda Research Laboratories                                                P.O. Box 6009                                                                 Gaithersburg, Maryland, 20877                                                 Boehringer Mannheim Biochemicals                                              7941 Castleway Drive                                                          Indianapolis, Indiana, 46250                                                  International Biochemicals, Inc.                                              P.O. Box 9558                                                                 New Haven, Connecticut, 06535                                                 New England BioLabs                                                           32 Tozer Road                                                                 Beverly, Massachusetts, 01915                                                 Promega                                                                       2800 S. Fish Hatchery Road                                                    Madison, Wisconsin, 53711                                                     Stratagene Cloning Systems                                                    11099 North Torrey Pines Road                                                 La Jolla, California, 92037                                            ______________________________________                                    

                  TABLE 5                                                         ______________________________________                                         ##STR54##                                                                    ______________________________________                                        Summary of cuts.                                                               ##STR55##                                                                     ##STR56##                                                                     ##STR57##                                                                     ##STR58##                                                                     ##STR59##                                                                     ##STR60##                                                                     ##STR61##                                                                     ##STR62##                                                                     ##STR63##                                                                     ##STR64##                                                                     ##STR65##                                                                     ##STR66##                                                                     ##STR67##                                                                     ##STR68##                                                                     ##STR69##                                                                     ##STR70##                                                                     ##STR71##                                                                     ##STR72##                                                                     ##STR73##                                                                     ##STR74##                                                                     ##STR75##                                                                     ##STR76##                                                                     ##STR77##                                                                     ##STR78##                                                                     ##STR79##                                                                     ##STR80##                                                                     ##STR81##                                                                     ##STR82##                                                                     ##STR83##                                                                     ##STR84##                                                                     ##STR85##                                                                     ##STR86##                                                                     ##STR87##                                                                     ##STR88##                                                                     ##STR89##                                                                    ______________________________________                                    

                                      TABLE 6                                     __________________________________________________________________________    Exposure of amino acid types in T4 lzm & HEWL.                                __________________________________________________________________________    HEADER HYDROLASE (O-GLYCOSYL) 18-AUG-86                                                                       2LZM                                          COMPND LYSOZYME (E.C.3.2.1.17)                                                AUTHOR L. H. WEAVER, B. W. MATTHEWS                                           Coordinates from Brookhaven Protein Data Bank:                                                                1LYM.                                         Only Molecule A was considered.                                               HEADER HYDROLASE (O-GLYCOSYL) 29-JUL-82                                                                       1LYM                                          COMPND LYSOZYME (E.C.3.2.1.17)                                                AUTHOR J. HOGLE, S. T. RAO, M. SUNDARALIGAM                                   Solvent radius = 1.40   Atomic radii in Table 7.                              Surface area measured in Å.sup.2.                                                                   Max                                                 Type N <area> sigma                                                                             max  min                                                                              exposed(fraction)                                   __________________________________________________________________________    ALA  27                                                                              211.0  1.47                                                                              214.3                                                                              207.1                                                                             85.1(0.40)                                         CYS  10                                                                              239.8  3.56                                                                              245.5                                                                              234.4                                                                             38.3(0.16)                                         ASP  17                                                                              271.1  5.36                                                                              281.4                                                                              262.5                                                                            127.1(0.47)                                         GLU  10                                                                              297.2  5.78                                                                              304.9                                                                              285.4                                                                            100.7(0.34)                                         PHE   8                                                                              316.6  5.92                                                                              325.4                                                                              307.5                                                                             99.8(0.32)                                         GLY  23                                                                              185.5  1.31                                                                              188.3                                                                              183.3                                                                             91.9(0.50)                                         HIS   2                                                                              297.7  3.23                                                                              301.0                                                                              294.5                                                                             32.9(0.11)                                         ILE  16                                                                              278.1  3.61                                                                              285.6                                                                              269.6                                                                             57.5(0.21)                                         LYS  19                                                                              309.2  5.38                                                                              321.9                                                                              300.1                                                                            147.1(0.48)                                         LEU  24                                                                              282.6  6.75                                                                              304.0                                                                              269.8                                                                            109.9(0.39)                                         MET   7                                                                              293.0  5.70                                                                              299.5                                                                              283.1                                                                             88.2(0.30)                                         ASN  26                                                                              273.0  5.75                                                                              285.1                                                                              262.6                                                                            143.4(0.53)                                         PRO   5                                                                              239.9  2.75                                                                              242.1                                                                              234.6                                                                            128.7(0.54)                                         GLN   8                                                                              299.5  4.75                                                                              305.8                                                                              291.5                                                                            145.9(0.49)                                         ARG  24                                                                              344.7  8.66                                                                              355.8                                                                              326.7                                                                            240.7(0.70)                                         SER  16                                                                              228.6  3.59                                                                              236.6                                                                              223.3                                                                             98.2(0.43)                                         THR  18                                                                              250.3  3.89                                                                              257.2                                                                              244.2                                                                            139.9(0.56)                                         VAL  15                                                                              254.3  4.05                                                                              261.8                                                                              245.7                                                                            111.1(0.44)                                         TRP   9                                                                              359.4  3.38                                                                              366.4                                                                              355.1                                                                            102.0(0.28)                                         TYR   9                                                                              335.8  4.97                                                                              342.0                                                                              325.0                                                                             72.6(0.22)                                         __________________________________________________________________________

                  TABLE 7                                                         ______________________________________                                        Atomic radii                                                                                Å                                                           ______________________________________                                               C.sub.α                                                                          1.70                                                                 O.sub.carbonyl                                                                         1.52                                                                 N.sub.amide                                                                            1.55                                                                 Other atoms                                                                            1.80                                                          ______________________________________                                    

                  TABLE 8                                                         ______________________________________                                        Fraction of DNA molecules having                                              n non-parental bases when                                                     reagents that have fraction                                                   M of parental nucleotode.                                                     M     .9965   .97716   .92612                                                                              .8577  .79433                                                                              .63096                              ______________________________________                                        f0    .9000   .5000    .1000 .0100  .0010 .000001                             f1    .09499  .35061   .2393 .04977 .00777                                                                              .0000175                            f2    .00485  .1188    .2768 .1197  .0292 .000149                             f3    .00016  .0259    .2061 .1854  .0705 .000812                             f4    .000004 .00409   .1110 .2077  .1232 .003207                             f8    0.      2 · 10.sup.-7                                                                 .00096                                                                              .0336  .1182 .080165                             f16   0.      0.       0.    5 · 10.sup.-7                                                               .00006                                                                              .027281                             f23   0.      0.       0.    0.     0.    .0000089                            most  0       0        2     5      7     12                                  ______________________________________                                         "most" is the value of n having the highest probability.                 

                                      TABLE 9                                     __________________________________________________________________________    best vgCodon                                                                  __________________________________________________________________________    Program "Find Optimum vgCodon."                                               INITIALIZE-MEMORY-OF-ABUNDANCES                                               DO ( t1 = 0.21 to 0.31 in steps of 0.01 )                                     . DO ( c1 = 0.13 to 0.23 in steps of 0.01 )                                   . . DO (a1 = 0.23 to 0.33 in steps of 0.01 )                                  Comment  calculate g1 from other concentrations                               . . . g1 = 1.0 - t1 - c1 - a1                                                 . . . IF( g1 .ge. 0.15 )                                                      . . . . DO ( a2 = 0.37 to 0.50 in steps of 0.01 )                             . . . . . DO (c2 = 0.12 to 0.20 in steps of 0.01 )                            Comment  Force D+E = R + K                                                    . . . . . . g2 = (g1*a2 -.5*a1*a2)/(c1+0.5*a1)                                Comment  Calc t2 from other concentrations.                                   . . . . . . t2 = 1. - a2 - c2 - g2                                            . . . . . . IF(g2.gt. 0.1.and. t2.gt.0.1)                                     . . . . . . . CALCULATE-ABUNDANCES                                            . . . . . . . COMPARE-ABUNDANCES-TO-PREVIOUS-ONES                             . . . . . . ..end.sub.-- IF.sub.-- block                                      . . . . . ..end.sub.-- DO.sub.-- loop ! c2                                    . . . . ..end.sub.-- DO.sub.-- loop ! a2                                      . . . ..end.sub.-- IF.sub.-- BLOCK ! if g1 big enough                         . . ..end.sub.-- DO.sub.-- loop ! a1                                          . ..end.sub.-- DO.sub.-- loop ! c1                                            ..end.sub.-- DO.sub.-- loop ! t1                                              WRITE the best distribution and the abundances.                               __________________________________________________________________________

                  TABLE 10                                                        ______________________________________                                        Abundances obtained                                                           from various vgCodons                                                         ______________________________________                                        A. Optimized fxS Codon, Restrained by [D]+[E] = [K]+[R]                               T        C        A      G                                            ______________________________________                                        1       .26      .18      .26    .30  f                                       2       .22      .16      .40    .22  x                                       3       .5       .0       .0     .5   S                                       ______________________________________                                        Amino               Amino                                                     acid      Abundance acid         Abundance                                    ______________________________________                                        A         4.80%     C            2.86%                                        D         6.00%     E            6.00%                                        F         2.86%     G            6.60%                                        H         3.60%     I            2.86%                                        K         5.20%     L            6.82%                                        M         2.86%     N            5.20%                                        P         2.88%     Q            3.60%                                        R         6.82%     S            7.02% mfaa                                   T         4.16%     V            6.60%                                        W         2.86% lfaa                                                                              Y            5.20%                                        stop      5.20%                                                               ______________________________________                                        [D] + [E] ═ [K] + [R] = .12                                               ratio = Abun(W)/Abun(S) = 0.4074                                              j   (1/ratio).sup.j                                                                              (ratio).sup.j                                                                           stop-free                                        ______________________________________                                        1   2.454           .4074    .9480                                            2   6.025           .1660    .8987                                            3   14.788          .0676    .8520                                            4   36.298          .0275    .8077                                            5   89.095          .0112    .7657                                            6   218.7          4.57 · 10.sup.-3                                                               .7258                                            7   536.8          1.86 · 10.sup.-3                                                               .6881                                            ______________________________________                                        B. Unrestrained, optimized                                                           T      C      A    G                                                   ______________________________________                                        1      .27    .19    .27  .27                                                 2      .21    .15    .43  .21                                                 3      .5     .0     .0   .5                                                  ______________________________________                                        Amino               Amino                                                     acid      Abundance acid         Abundance                                    ______________________________________                                        A         4.05%     C            2.84%                                        D         5.81%     E            5.81%                                        F         2.84%     G            5.67%                                        H         4.08%     I            2.84%                                        K         5.81%     L            6.83%                                        M         2.84%     N            5.81%                                        P         2.85%     Q            4.08%                                        R         6.83%     S            6.89% mfaa                                   T         4.05%     V            5.67%                                        W         2.84% lfaa                                                                              Y            5.81%                                        stop      5.81%                                                               ______________________________________                                        [D] + [E] = 0.1162  [K] + [R] = 0.1264                                        ratio = Abun(W)/Abun(S) = 0.41176                                             j   (1/ratio).sup.j                                                                              (ratio).sup.j                                                                           stop-free                                        ______________________________________                                        1   2.4286          .41176   .9419                                            2   5.8981          .16955   .8872                                            3   14.3241         .06981   .8356                                            4   34.7875         .02875   .7871                                            5   84.4849         .011836   .74135                                          6   205.180        .004874    .69828                                          7   498.3          2.007 · 10.sup.-3                                                              .6577                                            ______________________________________                                        C. Optimized NNT                                                                      T        C        A      G                                            ______________________________________                                        1        .2071   .2929    .2071  .2929                                        2        .2929   .2071    .2929  .2071                                        3       1.       .0       .0     .0                                           ______________________________________                                        Amino               Amino                                                     acid      Abundance acid         Abundance                                    ______________________________________                                        A         6.06%     C            4.29% lfaa                                   D         8.58%     E            none                                         F         6.06%     G            6.06%                                        H         8.58%     I            6.06%                                        K         none      L            8.58%                                        M         none      N            6.06%                                        P         6.06%     Q            none                                         R         6.06%     S            8.58% mfaa                                   T         4.29%     V            8.58%                                        W         none      Y            6.06%                                        stop      none                                                                ______________________________________                                        j   (1/ratio).sup.j                                                                              (ratio).sup.j                                                                           stop-free                                        ______________________________________                                        1   2.0            .5        1.                                               2   4.0            .25       1.                                               3   8.0            .125      1.                                               4   16.0           0.625     1.                                               5   32.0           0.3125    1.                                               6   64.0           .015625   1.                                               7   128.0          .0078125  1.                                               ______________________________________                                        D. Optimized NNG                                                                      T        C        A      G                                            ______________________________________                                        1       .23      .21      .23     .33                                         2       .215     .285     .285    .215                                        3       .0       .0       .0     1.0                                          ______________________________________                                        Amino               Amino                                                     acid      Abundance acid         Abundance                                    ______________________________________                                        A         9.40%     C            none                                         D         none      E            9.40%                                        F         none      G            7.10%                                        H         none      I            none                                         K         6.60%     L            9.50% mfaa                                   M         4.90%     N            none                                         P         6.00%     Q            6.00%                                        R         9.50%     S            6.60%                                        T         6.6%      V            7.10%                                        W         4.90% lfaa                                                                              Y            none                                         stop      6.60%                                                               ______________________________________                                        j   (1/ratio).sup.j                                                                              (ratio).sup.j                                                                           stop-free                                        ______________________________________                                        1   1.9388          .51579   0.934                                            2   3.7588          .26604   0.8723                                           3   7.2876          .13722   0.8148                                           4   14.1289         .07078   0.7610                                           5   27.3929        3.65 · 10.sup.-2                                                               0.7108                                           6   53.109         1.88 · 10.sup.-2                                                               0.6639                                           7   102.96         9.72 · 10.sup.-3                                                               0.6200                                           ______________________________________                                        E. Unoptimized NNS (NNK gives identical distribution)                                 T        C        A      G                                            ______________________________________                                        1       .25      .25      .25     .25                                         2       .25      .25      .25     .25                                         3       .0       .5       .0     0.5                                          ______________________________________                                        Amino               Amino                                                     acid      Abundance acid         Abundance                                    ______________________________________                                        A         6.25%     C            3.125%                                       D         3.125%    E            3.125%                                       F         3.125%    G            6.25%                                        H         3.125%    I            3.125%                                       K         3.125%    L            9.375%                                       M         3.125%    N            3.125%                                       P         6.25%     Q            3.125%                                       R         9.375%    S            9.375%                                       T         6.25%     V            6.25%                                        W         3.125%    Y            3.125                                        stop      3.125%                                                              ______________________________________                                        j   (1/ratio).sup.j                                                                              (ratio).sup.j                                                                           stop-free                                        ______________________________________                                        1   3.0             .33333   .96875                                           2   9.0             .11111   .9385                                            3   27.0            .03704   .90915                                           4   81.0            .01234567                                                                              .8807                                            5   243.0           .0041152  .8532                                           6   729.0          1.37 · 10.sup.-3                                                                .82655                                          7   2187.0         4.57 · 10.sup.-4                                                               .8007                                            ______________________________________                                    

                                      TABLE 11                                    __________________________________________________________________________    Calculate worst codon.                                                        __________________________________________________________________________    Program "Find worst vgCodon within Serr of given                              distribution."                                                                INITIALIZE-MEMORY-OF-ABUNDANCES                                               Comment Serr is % error level.                                                READ Serr                                                                     Comment T1i, C1i, A1i, G1i, T2i, C2i, A2i, G2i, T3i, G3i                      Comment are the intended nt-distribution.                                     READ T1i, C1i, A1i, G1i                                                       READ T2i, C2i, A2i, G2i                                                       READ T3i, G3i                                                                 Fdwn = 1.-Serr                                                                Fup .sup. = 1.+Serr                                                           DO ( ti = T1i*Fdwn to T1i*Fup in 7 steps)                                     . DO (c1 = C1i*Fdwn to C1i*Fup in 7 steps)                                    . . DO (a1 = A1i*Fdwn to A1i*Fup in 7 steps)                                  . . . g1 = 1. - t1 - c1 - a1                                                  . . . IF ( (g1-G1i)/G1i .lt. -Serr)                                           Comment g1 too far below G1i, push it back                                    . . . . g1 = G1i*Fdwn                                                         . . . . factor = (1.-g1)/(t1 + c1 + a1)                                       . . . . t1 = t1*factor                                                        . . . . c1 = c1*factor                                                        . . . . a1 = a1*factor                                                        . . . . .end.sub. -- IF.sub.-- block                                          . . . IF( (g1-G1i)/G1i .gt. Serr)                                             Comment g1 too far above g1i, push it back                                    . . . . g1 = g1i*Fup                                                          . . . . factor = (1.-g1)/(t1 + c1 + a1)                                       . . . . t1 = t1*factor                                                        . . . . c1 = c1*factor                                                        . . . . a1 = a1*factor                                                        . . . . .end.sub.-- IF.sub.-- block                                           . . . DO ( a2 = A2i*Fdwn to A2i*Fup in 7 steps)                               . . . . DO ( c2 = C2i*Fdwn to C2i*Fup in 7 steps)                             . . . . . DO (g2=G2i*Fdwn to G2i*Fup in 7 steps)                              Comment   Calc t2 from other concentrations.                                  . . . . . . t2 = 1. - a2 - c2 - g2                                            . . . . . . IF( (t2-T2i)/T2i .lt. -Serr)                                      Comment t2 too far below T2i, push it back                                    . . . . . . . t2 = T2i*Fdwn                                                   . . . . . . . factor = (1.-t2)/(a2 + c2 + g2)                                 . . . . . . . a2 = a2*factor                                                  . . . . . . . c2 = c2*factor                                                  . . . . . . . g2 = g2*factor                                                  . . . . . . ..end.sub.-- IF.sub.-- block                                      . . . . . . IF( (t2-T2i)/T2i .gt. Serr)                                       Comment t2 too far above T2i, push it back                                    . . . . . . . t2 = T2i*Fup                                                    . . . . . . . factor = (1.-t2)/(a2 + c2 + g2)                                 . . . . . . . a2 = a2*factor                                                  . . . . . . . c2 = c2*factor                                                  . . . . . . . g2 = g2*factor                                                  . . . . . . ..end.sub.-- IF.sub.-- block                                      . . . . . . IF(g2.gt. 0.0 .and. t2.gt.0.0)                                    . . . . . . . t3 = 0.5*(1.-Serr)                                              . . . . . . . g3 = 1. - t3                                                    . . . . . . . CALCULATE-ABUNDANCES                                            . . . . . . . COMPARE-ABUNDANCES-TO-PREVIOUS-ONES                             . . . . . . . t3 = 0.5                                                        . . . . . . . g3 = 1. - t3                                                    . . . . . . . CALCULATE-ABUNDANCES                                            . . . . . . . COMPARE-ABUNDANCES-TO-PREVIOUS-ONES                             . . . . . . . t3 = 0.5*(1.+Serr)                                              . . . . . . . g3 = 1. - t3                                                    . . . . . . . CALCULATE-ABUNDANCES                                            . . . . . . . COMPARE-ABUNDANCES-TO-PREVIOUS-ONES                             . . . . . . ..end.sub.-- IF.sub.-- block                                      . . . . . ..end.sub.-- DO.sub.-- loop ! g2                                    . . . . ..end.sub.-- DO.sub.-- loop ! c2                                      . . . ..end.sub.-- DO.sub.-- loop ! a2                                        . . ..end.sub.-- DO.sub.-- loop ! a1                                          . ..end.sub.-- DO.sub.-- loop ! c1                                            ..end.sub.-- DO.sub.-- loop ! t1                                              WRITE the WORST distribution and the abundances.                              __________________________________________________________________________

                  TABLE 12                                                        ______________________________________                                        Abundances obtained                                                           using optimum vgCodon assuming                                                5% errors                                                                     ______________________________________                                        Amino                  Amino                                                  acid      Abundance    acid    Abundance                                      ______________________________________                                          A       4.59%        C       2.76%                                            D       5.45%        E       6.02%                                            F       2.49% lfaa   G       6.63%                                            H       3.59%        I       2.71%                                            K       5.73%        L       6.71%                                            M       3.00%        N       5.19%                                            P       3.02%        Q       3.97%                                            R       7.68% mfaa   S       7.01%                                            T       4.37%        V       6.00%                                            W       3.05%        Y       4.77%                                          stop      5.27%                                                               ______________________________________                                        ratio = Abun(F)/Abun(R) = 0.3248                                              j     (1/ratio).sup.j                                                                              (ratio).sup.j                                                                          stop-free                                       ______________________________________                                        1     3.079          .3248    .9473                                           2     9.481          .1055    .8973                                           3     29.193         .03425   .8500                                           4     89.888         .01112   .8052                                           5     276.78         3.61·10.sup.-3                                                                .7627                                           6     852.22         1.17·10.sup.-3                                                                .7225                                           7     2624.1         3.81·10.sup.-4                                                                .6844                                           ______________________________________                                    

    TABLE 13      BPTI Homologues       R # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25     26 27 28 29 30 31 32 33 34 35 36 37 38 39 40       -5                     -- -- -- -- -- -- -- -- -- -- -- -- -- D -- --     -- -- -- -- -- -4                     -- -- -- -- -- -- -- -- -- -- --     --  -- E -- -- -- -- -- -- -- -3  -- -- -- F -- -- -- -- -- -- -- -- --     -- -- -- Z -- -- -- -- -- -- -- -- -- -- -- -- -- -- T P -- -- -- -- --     -- -- -2  -- -- -- Q T -- -- -- -- -- -- Q -- -- -- H G Z -- Z -- L Z R     K -- -- -- R R -- E T -- -- -- -- -- -- -- -1  -- -- -- T E -- -- -- --     -- -- P -- -- -- D D G -- P -- Q D D N -- -- -- Q K -- R T -- -- -- Z --     -- --  1 R R R P R R R R R R R L A R R R K R A R R H H R R I K T R R R G     D K T R R R R R  2 P P P P P P P P P P P R A P P P R P A R P R P P P N E     V H H P F L A V P P P P P  3 D D D D D D D D D D D K K D R T D S K K Y T     K K T G D A R P D L P D E D D D D D  4 F F F L F F F F F F F L Y F F F I     F Y L A F F F F D S A D D F D I S A F F F F F  5 C C C C C C C C C C C C     C C C C C C C C C C C C C C C C C C C C C C C C C C C C  6 L L L Q L L L     L L L L I K E E N R N K I E K Y Y N E Q N D D L T E Q N L L L L L  7 E E     E L E E E E E E E L L L L L L L L L L L L L L L L L K K E S Q L L E E E     E E  8 P P P P P P P P P P P H P P P P P P P H I P P P L P G P P P P P A     D P P P P P P  9 P P P Q P P P P P P P R L A A P P A V R V A A A P K Y V     P P P P FG Y I P P P P P 10 Y Y Y A Y Y Y Y Y Y Y N R E E E E E R N A E     D D E V S I D D Y V D S V Y Y Y Y Y 11 T T T R T T T T T T T P I T T S Q     T Y P A P P P T V A R K T T T A Q Q T T T T T 12 G G G G G G G G G G G G     G G G G G G G G G G G G G G G G G K G G G G G G G G G G 13 P P P P P P P     P P P P R P L L R P P P R P P R R R P P P N I P P L P P P P P P P 14 C T     A C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C     C C 15 K K K K K V G A L I K Y K K K R K K K Y M K K L N R M R -- -- K R     F L R R K K K K 16 A A A A A A A A A A A Q R A A G G A K D F A A A A A G     A G Q A A G G A A A A A A 17 R R R A A R R R R R R K K Y R H R S K K F S     H Y L R M F P T K G Y L F R R R R K 18 I I I L M I I I I I I I I I I I L     I F I I I I M I F T I V V M F M F I I M I M M 19 I I I L I I I I I I I P     P R R R P R P P S P P P P P S Q R R I K K K Q I I I I I 20 R R R R R R R     R R R R A S S S R R Q S A A A R R A R R L A A R R L R L R R R R R 21 Y Y     Y Y Y Y Y Y Y Y Y F F F F I Y Y F F F F F F F Y Y W F F Y Y Y Y W Y Y Y     Y Y 22 F F F F F F F F F F F Y Y H H Y F Y Y Y Y Y Y Y Y Y F A Y Y F N S     F A F F F F F 23 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y F     Y Y Y Y Y Y F Y Y Y Y Y 24 N N N N N N N N N N N N K N N N N N N N S N D     N N N N D D K N N N N D N N N N N 25 A A A S A A A A A A A Q W L R L P S     W Q K W S P S S G A T P A T Q G A A A A A A 26 K K K T K K K K K K K K K     A A E A K K K G A A A H S T V R S K R E T V K K K K K 27 A A A S A A A A     A A A K A A A S S S A K A A S S L S S K L A A T T S K A A A A A 28 G G G     N G G G G G G G K K Q Q N R G K K N K N N H K M G K K G K K M G G G G G     G 29 L L L A F L L L L L L Q Q Q Q K M G Q Q K K K K K R A K T R F Q N A     K L L L L F 30 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C     C C C C C C C C C C C 31 Q Q Q E E Q Q Q Q Q Q E L L L K E Q L E Y Q N E     Q E E V K V E E E E V Q Q Q Q E 32 T T T P T T T T T T T G P Q E V S Q P     R P L K K K K T L A Q T P E T R T P P P T 33 F F F F F F F F F F F F F F     F F F F F F F F F F F F F F F F F F F F F F F F F F 34 V V V T V V V V V     V V T D I I F I I N D T H I I N I Q P Q R V K I L S V V V V V 35 Y Y Y Y     Y Y Y Y Y Y Y W Y Y Y Y Y Y Y W Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y     36 G G G G G G G G G G G S S G G G G G S S S G G G G G G G R G G G G G G     G G G G G 37 G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G     G G G G G G G G G G 38 C T A C C C C C C C C C C C C C C C C C C C C C C     C C C C C C C C C C C C C C C 39 R R R Q R R R R R R R G G G G G K R G G     R K P R G G M Q D D K K Q M K R R R R K 40 A A A G A A A A A A A G G G G     G G G G G G G G G G G G G G G A G G G G A A A A A 41 K K K N K K K K K K     K N N N N N N N N N N N N N N N N N D D K N N N N K K K K K 42 R R R N S     R R R R R R S A A A A K Q A S A A A A A A G G H H S G D L G R S R R S 43     N N N N N N N N N N N N N N N N N N N N N N N N N N N N G G N N N N N N     N N N N 44 N N N N N N N N N N N R R R R N N R R R R R N N N N N K N N N     R R N K N N N N N 45 F F F F F F F F F F F F F F F F F F F F F F F F F F     F F F F F Y F F F F F F F F 46 K K K E K K K K K K K K K K K E K D K K K     S K K K H V Y K K R K S L Y K K K K R 47 S S S T S S S S S S S T T T T T     T T T T T T T T T T T S T S S S T S S S S S S S 48 A A A T A A A A A A A     I I I I R K T I I I I W W I L E E E D A E L Q Q A A S A A 49 E E E E E E     E E E E E E E D D D A Q E E E E D D D E K K T H E Q A K K E E E E E 50 D     D D M D D D D D D D E E E E E E Q E E E K E E E E E E L L D D E E E D D     D D D 51 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C     C C C C C C C C 52 M M M L M M M M M M E R R R H R V Q R R R R R R Q E L     R R R M L E L K E M M M M 53 R R R R R R R R R R R R R R R E R G R R R H     Q H R K Q E C C R D Q Q E R R R R R 54 T T T I T T T T T T T T T T T T A     V T T T A T T T V T Y E E T A K T Y T T T T T 55 C C C C C C C C C C C C     C C C C C C C C C C C C C C C C C C C C C C C C C C C C 56 G G G E G G G     G G G G I V V V G R V V I V V G V A G R G L E G S I R G G G G G G 57 G G     G P G G G G G G G R G G G G P -- G G V G A A A V -- V V L G G N -- I G G     G G G 58 A A A P A A A A A A A K -- -- -- K P -- -- -- -- -- S S K R --     P Y Y A F -- -- P A A A A A 59 -- -- -- Q -- -- -- -- -- -- -- -- -- --     -- -- E -- -- -- -- -- A G Y S -- G P R -- -- -- -- G -- -- -- -- -- 60     -- -- -- Q -- -- -- -- -- -- -- -- -- -- -- -- R -- -- -- -- -- -- I G     -- -- D -- -- -- -- -- -- E -- -- -- -- -- 61 -- -- -- T -- -- -- -- --     --  -- -- -- -- -- -- P -- -- -- -- -- -- -- -- -- -- E -- -- -- -- --     -- A -- -- -- -- -- 62 -- -- -- D -- -- -- -- -- -- -- -- -- -- -- -- --     -- -- 63 -- -- -- K -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 64 --     -- -- S -- -- -- -- -- -- -- -- -- -- -- -- -- -- --  1 BPTI (SEQ IDNO:     44)  2 Engineered BPTI From MARK87  3 Engineered BPTI From MARK87  4     Bovine Colostrum (DUFT85)  5 Bovine Serum (DUFT85)  6 Semisynthetic     BPTI, TSCH87  7 Semisynthetic BPTI, TSCH87  8 Semisynthetic BPTI, TSCH87      9 Semisynthetic BPTI, TSCH87 10 Semisynthetic BPTI, TSCH87 11 Engineered     BPTI, AUER87 12 Dendroaspis polylepis polylepis (Black mamba) venom I     (DUFT85) 13 Dendroaspis polylepis polylepis (Black mamba) venom K     (DUFT85) 14 Hemachatus hemachates (Ringhals Cobra) HHV II (DUFT85) 15     Naja nivea (Cape cobra) NNV II (DUFT85) 16 Vipera russelli (Russel's     viper) RVV II (TAKA74) 17 Red sea turtle egg white (DUFT85) 18 Snail     mucus (Helix pomania) (WAGN78) 19 Dendroaspis angusticeps (Eastern green     mamba) C13 S1 C3 toxin (DUFT85) 20 Dendroaspis angusticeps (Eastern     Green Mamba) C13 S2 C3 toxin (DUFT85) 21 Dendroaspis polylepis polylepes (     Black mamba) B toxin (DUFT85) 22 Dendroaspis polylepis polylepes (Black     Mamba) E toxin (DUFT85) 23 Vipera ammodytes TI toxin (DUFT85) 24 Vipera     ammodytes CTI toxin (DUFT85) 25 Bungarus fasciatus VIII B toxin (DUFT85)     26 Anemonia sulcata (sea anemone) 5 II (DUFT85) 27 Homo sapiens HI-14     "inactive" domain (DUFT85) 28 Homo sapiens HI-8 ¢active" domain     (DUFT85) 29 beta bungarotoxin B1 (DUFT85) 30 beta bungarotoxin B2     (DUFT85) 31 Bovine spleen TI II (FIOR85) 32 Tachypleus tridentatus     (Horseshoe crab) hemocyte inhibitor (NAKA87) 33 Bombyx mori (silkworm)     SCI-III (SASA84) 34 Bos taurus (inactive ) BI-14 35 Bos taurus (active)     BI-8 36 Engineered BPTI (KR15, ME52): Auerswald '88, Biol Chem Hoppe-Seyl     er, 369 Supplement, pp 27-35. 37 Isoaprotinin G-1: Siekmann, Wenzel,     Schroder, and Tschesche '88, Biol Chem Hoppe-Seyler, 369: 157-163. 38     Isoaprotinin 2: Siekmann, Wenzel, Schroder, and Tschesche '88, Biol Chem     Hoppe-Seyler, 369: 157-163. 39 Isoaprotinin G-2: Siekmann, Wenzel,     Schroder, and Tschesche '88, Biol Chem Hoppe-Seyler, 369: 157-163. 40     Isoaprotinin 1: Siekmann, Wenzel, Schroder, and Tschesche '88, Biol Chem     Hoppe-Seyler, 369: 157-163.     Note:     a) both beta bungarotoxins have residue 15 deleted.     b) B. mori has an extra residue between C5 and C14; we have assigned F an     G to residue 9.     c) all natural proteins have C at 5, 14, 30, 38, 50, & 55.     d) all homologues have F33 and G37.     e) extra C's in bungarotoxins form interchain cystine bridges

    ______________________________________                                        Identification codes for Tables 14 and 15                                     ______________________________________                                        1   BPTI                                                                      2   synthetic BPTI, Tan & Kaiser, biochem. 16(8)1531-41                       3   Semisynthetic BPTI, TSCH87                                                4   Semisynthetic BPTI, TSCH87                                                5   Semisynthetic BPTI, TSCH87                                                6   Semisynthetic BPTI, TSCH87                                                7   Semisynthetic BPTI, TSCH87                                                8   Engineered BPTI, AUER87                                                   9   BPTI Auerswald &al GB 2 208 511A                                          10  BPTI Auerswald &al GB 2 208 511A                                          11  Engineered BPTI From MARK87                                               12  Engineered BPTI From MARK87                                               13  BPTI(KR15, ME52): Auerswald '88, Biol Chem Hoppe-                             Seyler, 369 Suppl, pp 27-35.                                              14  BPTI CA30/CA51 Eigenbrot &al, Protein Engineering                             3(7)591-598 ('90)                                                         15  Isoaprotinin 2 Siekmann et al '88, Biol Chem                                  Hoppe-Seyler, 369: 157-163.                                               16  Isoaprotinin G-2: Siekmann et al '88, Biol Chem                               Hoppe-Seyler, 369: 157-163.                                               17  BPTI Engineered, Auerswald &al GB 2 208 511A                              18  BPTI Engineered, Auerswald &al GB 2 208 511A                              19  BPTI Engineered, Auerswald &al GB 2 208 511A                              20  Isoaprotinin G-1 Siekmann &al '88, Biol Chem                                  Hoppe-Seyler, 369: 157-163.                                               21  BPTI Engineered, Auerswald &al GB 2 208 511A                              22  BPTI Engineered, Auerswald &al GB 2 208 511A                              23  Bovine Serum (in Dufton '85)                                              24  Bovine spleen TI II (FIOR85)                                              25  Snail mucus (Helix pomatia) (WAGN78)                                      26  Hemachatus hemachates (Ringhals Cobra) HHV II (in Dufton                      '85)                                                                      27  Red sea turtle egg white (in Dufton '85)                                  28  Bovine Colostrum (in Dufton '85)                                          29  Naja nivea (Cape cobra) NNV II (in Dufton '85)                            30  Bungarus fasciatus VIII B toxin (in Dufton '85)                           31  Vipera ammodytes TI toxin (in Dufton '85)                                 32  Porcine ITI domain 1, (in CREI87)                                         33  Human Alzheimer's beta APP protease inhibitor, (SHIN90)                   34  Equine ITI domain 1, in Creighton & Charles                               35  Bos taurus (inactive) BI-8e (ITI domain 1)                                36  Anemonia sulcata (sea anemone) 5 II (in Dufton '85)                       37  Dendroaspis polylepis polylepes (Black Mamba) E toxin (in                     Dufton '85)                                                               38  Vipera russelli (Russel's viper) RVV II (TAKA74)                          39  Tachypleus tridentatus (Horseshoe crab) hemoctye                              inhibitor (NAKA87)                                                        40  LACI 2 (Factor Xa) (WUNT88)                                               41  Vipera ammodytes CTI toxin (in Dufton '85)                                42  Dendroaspis polylepis polylepis (Black Mamba) venom K (in                     Dufton '85)                                                               43  Homo sapiens HI-8e "inactive" domain (in Dufton '85)                      44  Green Mamba toxin K, (in CREI87)                                          45  Dendroaspis angusticeps (Eastern green mamba) C13 S1 C3                       toxin (in Dufton '85)                                                     46  LACI 3                                                                    47  Equine ITI domain 2, (CREI87)                                             48  LACI 1 (VIIa)                                                             49  Dendroaspis polylepis polylepis (Black Mamba) B toxin (in                     Dufton '85)                                                               50  Porcine ITI domain 2, Creighton and Charles                               51  Homo sapiens HI-8t "active" domain (in Dufton '85)                        52  Bos taurus (active) BI-8t                                                 53  Trypstatin Kito &al ('88) J Biol CHem 263(34)18104-07                     54  Dendroaspis angusticeps (Eastern Green Mamba) C13 S2 C3                       toxin (in Dufton '85)                                                     55  Green Mamba I venom Creighton & Charles '87 CSHSQB                            52:511-519.                                                               56  beta bungarotoxin B2 (in Dufton '85)                                      57  Dendroaspis polylepis polylepis (Black Mamba) venom I (in                     Dufton '85)                                                               58  beta bungarotoxin B1 (in Dufton '85)                                      59  Bombyx mori (silkworm) SCI-III (SASA84)                                   ______________________________________                                    

                  TABLE 14                                                        ______________________________________                                        Tally of Ionizable groups                                                     Identifier                                                                           D     E      K   R   Y   H   NH   CO2  +    ions                       ______________________________________                                         1     2     2      4   6   4   0   1    1    6    16                          2     2     2      4   6   4   0   1    1    6    16                          3     2     2      3   6   4   0   1    1    5    15                          4     2     2      3   6   4   0   1    1    5    15                          5     2     2      3   6   4   0   1    1    5    15                          6     2     2      3   6   4   0   1    1    5    15                          7     2     2      3   6   4   0   1    1    5    15                          8     2     3      4   6   4   0   1    1    5    17                          9     2     2      3   5   4   0   1    1    4    14                         10     2     3      3   6   4   0   1    1    4    16                         11     2     2      4   6   4   0   1    1    6    16                         12     2     2      4   6   4   0   1    1    6    16                         13     2     3      3   7   4   0   1    1    5    17                         14     2     2      4   6   4   0   1    1    6    16                         15     2     2      4   6   4   0   1    1    6    16                         16     2     2      4   6   4   0   1    1    6    16                         17     2     2      3   5   4   0   1    1    4    14                         18     2     3      3   5   4   0   1    1    3    15                         19     2     3      3   5   4   0   1    1    3    15                         20     2     2      4   5   4   0   1    1    5    15                         21     2     3      3   4   4   0   1    1    2    14                         22     2     4      3   4   4   0   1    1    1    15                         23     2     4      4   4   4   0   1    1    2    16                         24     2     3      5   4   4   0   1    1    4    16                         25     1     1      2   4   4   0   1    1    4    10                         26     2     3      2   5   3   1   1    1    2    14                         27     2     4      6   8   3   0   1    1    8    22                         28     2     4      2   3   3   0   1    1    -1   13                         29     1     4      2   7   2   2   1    1    4    16                         30     1     2      5   3   4   2   1    1    5    13                         31     4     1      5   3   4   2   1    1    3    15                         32     1     4      3   2   4   1   1    1    0    12                         33     2     6      1   5   3   0   1    1    -2   16                         34     2     4      2   2   3   1   1    1    -2   12                         35     2     2      3   2   4   0   1    1    1    11                         36     1     5      4   5   4   1   1    1    3    17                         37     0     2      6   3   3   3   1    1    7    13                         38     2     5      3   7   3   2   1    1    3    19                         39     3     3      5   5   4   0   1    1    4    18                         40     3     7      4   3   4   0   1    1    -3   19                         41     3     2      4   6   5   1   1    1    5    17                         42     1     2      8   5   4   0   1    1    10   18                         43     1     4      2   2   4   0   1    1    -1   11                         44     1     2      9   4   5   0   1    1    10   18                         45     0     2      8   4   5   0   1    1    10   16                         46     1     3      5   5   3   0   1    1    6    16                         47     3     4      4   3   3   0   1    1    0    16                         48     3     6      5   4   1   1   1    1    0    20                         49     0     3      3   5   5   0   1    1    5    13                         50     2     6      4   2   3   0   1    1    -2   16                         51     2     4      4   3   3   0   1    1    1    15                         52     1     4      6   2   3   0   1    1    3    15                         53     2     2      5   1   4   0   1    1    2    12                         54     2     3      6   8   3   1   1    1    9    21                         55     1     3      6   7   3   1   1    1    9    19                         56     6     2      6   7   4   3   1    1    5    23                         57     0     3      7   7   3   1   1    1    11   19                         58     6     2      5   7   4   2   1    1    4    22                         59     4     7      3   1   4   0   1    1    -7   17                         ______________________________________                                    

                                      TABLE 15                                    __________________________________________________________________________    Frequency Acids at Each Position                                              in BPTI and 58 Homologues                                                     Res.                                                                              Different                                                                 Id. AAs   Contents              First                                         __________________________________________________________________________    -5   2    -58 D                 --                                            -4   2    -58 E                 --                                            -3   5    -55 P T Z F           --                                            -2  10    -43 R3 Z3 Q3 T2 E G H K L                                                                           --                                            -1  11    -41 D4 P3 R2 T2 Q2 G K N Z 3                                                                        --                                            1   13    R35 K6 T4 A3 H2 G2 L M N P I D --                                                                   R                                             2   10    P35 R6 A4 V4 H3 E3 N F I L P                                        3   11    D32 K8 S4 A3 T3 R2 E2 P2 G L Y                                                                      D                                             4    9    F34 A6 D4 L4 S4 Y3 I2 W V                                                                           F                                             5    1    C59                   C                                             6   13    L25 N7 E6 K4 Q4 I3 D2 S2 Y2 R F T A                                                                 L                                             7    7    L28 E25 K2 F Q S T    E                                             8   10    P46 H3 D2 G2 E I K L A Q                                                                            P                                             9   12    P30 A9 I4 V4 R3 Y3 L F Q H E K                                                                      P                                             9a   2    -58 G                 --                                            10   9    Y24 E8 D8 V6 R3 S3 A3 N3 I                                                                          Y                                             11  11    T31 Q8 P7 R3 A3 Y2 K S D V I                                                                        T                                             12   2    G58 K                 G                                             13   5    P45 R7 L4 I2 N        P                                             14   3    C57 A T               C                                             15  12    K22 R12 L7 V6 Y3 M2 -2 N I A F G K                                  16   7    A41 G9 F2 D2 K2 Q2 R  A                                             17  14    R19 L87 K7 F5 M4 Y4 H2 A2 S2 G2 I N T P                                                             R                                             18   8    I41 M7 F4 L2 V2 E T A I                                             19  10    I24 P12 R8 K5 S4 Q2 L N E T                                                                         I                                             20   5    R39 A8 L6 S5 Q        R                                             21   5    Y35 F17 W5 I L        Y                                             22   6    F32 Y18 A5 H2 S N     F                                             23   2    Y52 F7                Y                                             24   4    N47 D8 K3 S           N                                             25  13    A29 S6 Q4 W5 P3 T2 L2 R N K V I                                                                     A                                             26  11    K31 A9 T5 S3 V3 R2 E2 G H F Q                                                                       K                                             27   8    A32 S11 K5 T4 Q3 L2 I E                                                                             A                                             28   7    G32 K13 N5 M4 Q2 R2 H G                                             29  10    L22 K13 Q11 A5 F2 R2 N G M T                                                                        L                                             30   2    C58 A                 C                                             31  10    Q25 E17 L5 V5 K2 N A R I Y                                                                          Q                                             32  11    T25 P11 K4 Q4 L4 R3 E3 G2 S A V                                                                     T                                             33   1    F59                   F                                             34  13    V24 I10 T5 N3 Q3 D3 K3 F2 H2 R S P L                                                                V                                             35   2    Y56 W3                Y                                             36   3    G50 S8 R              G                                             37   1    G59                   G                                             38   3    C57 A T               C                                             39   9    R25 G13 K6 Q4 E3 M3 L2 D2 P                                                                         R                                             40   2    G35 A24               A                                             41   3    N33 K24 D2            K                                             42  12    R22 A12 G8 S6 Q2 H2 N2 M D E K L                                                                    R                                             43   2    N57 G2                                                              44   3    N40 R14 K5            N                                             45   2    F58 Y                 F                                             46  11    K39 Y5 E4 S2 V2 D2 R H T A L                                                                        K                                             47    2   S36 T23               S                                             48  11    A23 I11 E6 Q6 L4 K2 T2 W2 S D R                                                                     A                                             49   8    E37 K8 D6 Q3 A2 P H T E                                             50   7    E27 D25 K2 L2 M Q Y   D                                             51   2    C58 A                 C                                             52   9    M17 R15 E8 L7 K6 Q2 T2 H V                                                                          M                                             53  11    R37 E6 Q5 K2 C2 H2 A N G D W                                                                        R                                             54   8    T41 Y5 A4 V3 I2 E2 M K                                                                              T                                             55   1    C59                   C                                             56  10    G33 V9 R5 I4 E3 L A S T K                                                                           G                                             57  12    G34 V6 -5 A3 R2 I2 P2 D K S L N                                                                     G                                             58  10    A25 -15 P7 K3 S2 Y2 G2 F D R                                                                        A                                             __________________________________________________________________________

                                      TABLE 16                                    __________________________________________________________________________    Exposure in BPTI                                                              __________________________________________________________________________    Coordinates taken from                                                        Brookhaven Protein Data Bank entry 6PTI.                                      HEADER PROTEINASE INHIBITOR (TRYPSIN) 13-MAY-87                                                                 6PTI                                        COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR                                    COMPND 2(/BPTI$,CRYSTAL FORM /III$)                                           AUTHOR A.WLODAWER                                                             Solvent radius = 1.40                                                         Atomic radii given in Table 7                                                 Areas in Å.sup.2.                                                                     Not          Not                                                       Total  Covered      covered                                              Residue                                                                            area   by M/C                                                                             fraction                                                                              at all                                                                             fraction                                        __________________________________________________________________________    ARG 1                                                                              342.45 205.09                                                                             0.5989  152.49                                                                             0.4453                                          PRO 2                                                                              239.12 92.65                                                                              0.3875  47.56                                                                              0.1989                                          ASP 3                                                                              272.39 158.77                                                                             0.5829  143.23                                                                             0.5258                                          PHE 4                                                                              311.33 137.82                                                                             0.4427  43.21                                                                              0.1388                                          CYS 5                                                                              241.06 48.36                                                                              0.2006  0.23 0.0010                                          LEU 6                                                                              280.98 151.45                                                                             0.5390  115.87                                                                             0.4124                                          GLU 7                                                                              291.39 128.91                                                                             0.4424  90.39                                                                              0.3102                                          PRO 8                                                                              236.12 128.71                                                                             0.5451  99.98                                                                              0.4234                                          PRO 9                                                                              236.09 109.82                                                                             0.4652  45.80                                                                              0.1940                                          TYR 10                                                                             330.97 153.63                                                                             0.4642  79.49                                                                              0.2402                                          THR 11                                                                             249.20 80.10                                                                              0.3214  64.99                                                                              0.2608                                          GLY 12                                                                             184.21 56.75                                                                              0.3081  23.05                                                                              0.1252                                          PRO 13                                                                             240.07 130.25                                                                             0.5426  75.27                                                                              0.3136                                          CYS 14                                                                             237.10 75.55                                                                              0.3186  53.52                                                                              0.2257                                          LYS 15                                                                             310.77 200.25                                                                             0.6444  192.00                                                                             0.6178                                          ALA 16                                                                             209.41 66.63                                                                              0.3182  45.59                                                                              0.2177                                          ARG 17                                                                             351.09 243.67                                                                             0.6940  201.48                                                                             0.5739                                          ILE 18                                                                             277.10 100.51                                                                             0.3627  58.95                                                                              0.2127                                          ILE 19                                                                             278.03 146.06                                                                             0.5254  96.05                                                                              0.3455                                          ARG 20                                                                             339.11 144.65                                                                             0.4266  43.81                                                                              0.1292                                          TYR 21                                                                             333.60 102.24                                                                             0.3065  69.67                                                                              0.2089                                          PHE 22                                                                             306.08 70.64                                                                              0.2308  23.01                                                                              0.0752                                          TYR 23                                                                             338.66 77.05                                                                              0.2275  17.34                                                                              0.0512                                          ASN 24                                                                             264.88 99.03                                                                              0.3739  38.69                                                                              0.1461                                          ALA 25                                                                             211.15 85.13                                                                              0.4032  48.20                                                                              0.2283                                          LYS 26                                                                             313.29 216.14                                                                             0.6899  202.84                                                                             0.6474                                          ALA 27                                                                             210.66 96.05                                                                              0.4560  54.78                                                                              0.2601                                          GLY 28                                                                             186.83 71.52                                                                              0.3828  32.09                                                                              0.1718                                          LEU 29                                                                             280.70 132.42                                                                             0.4718  93.61                                                                              0.0812                                          GLN 31                                                                             301.15 141.80                                                                             0.4709  82.64                                                                              0.2744                                          THR 32                                                                             251.26 138.17                                                                             0.5499  76.47                                                                              0.3043                                          PHE 33                                                                             304.27 59.79                                                                              0.1965  18.91                                                                              0.0622                                          VAL 34                                                                             251.56 109.78                                                                             0.4364  42.36                                                                              0.1684                                          TYR 35                                                                             332.64 80.52                                                                              0.2421  15.05                                                                              0.0452                                          GLY 36                                                                             187.06 11.90                                                                              0.0636  1.97 0.0105                                          GLY 37                                                                             185.28 84.26                                                                              0.4548  39.17                                                                              0.2114                                          CYS 38                                                                             234.56 73.64                                                                              0.3139  26.40                                                                              0.1125                                          ARG 39                                                                             417.13 304.62                                                                             0.7303  250.73                                                                             0.6011                                          ALA 40                                                                             209.53 94.01                                                                              0.4487  52.95                                                                              0.2527                                          LYS 41                                                                             314.60 166.23                                                                             0.5284  108.77                                                                             0.3457                                          ARG 42                                                                             349.06 232.83                                                                             0.6670  179.59                                                                             0.5145                                          ASN 43                                                                             266.47 38.53                                                                              0.1446  5.32 0.0200                                          ASN 44                                                                             269.65 91.08                                                                              0.3378  23.39                                                                              0.0867                                          PHE 45                                                                             313.22 69.73                                                                              0.2226  14.79                                                                              0.0472                                          LYS 46                                                                             309.83 217.18                                                                             0.7010  155.73                                                                             0.5026                                          SER 47                                                                             224.78 69.11                                                                              0.3075  24.80                                                                              0.1103                                          ALA 48                                                                             211.01 82.06                                                                              0.3889  31.07                                                                              0.1473                                          ASP 50                                                                             299.53 156.42                                                                             0.5222  95.96                                                                              0.3204                                          CYS 51                                                                             238.68 24.51                                                                              0.1027  0.00 0.0000                                          MET 52                                                                             293.05 89.48                                                                              0.3054  66.70                                                                              0.2276                                          ARG 53                                                                             356.20 224.61                                                                             0.6306  189.75                                                                             0.5327                                          THR 54                                                                             251.53 116.43                                                                             0.4629  51.64                                                                              0.2053                                          CYS 55                                                                             240.40 69.95                                                                              0.2910  0.00 0.0000                                          GLY 56                                                                             184.66 60.79                                                                              0.3292  32.78                                                                              0.1775                                          ALA 58                                                                             no position given in Protein Data Bank                                   __________________________________________________________________________     "Total Area" is the area measured by a rolling sphere of radius 1.4 Å     where only the atoms within the residue are considered. This takes accoun     of conformation.                                                              "Not covered by M/C" is the area measured by a rolling sphere of radius       1.4 Å where all mainchain atoms are considered, fraction is the           exposed area divided by the total area. Surface buried by mainchain atoms     is more definietly covered than is surface covered by side group atoms.       "Not covered at all" is the area measured by a rolling sphere of radius       1.4 Å where all atoms of the protein are considered.                 

                  TABLE 17                                                        ______________________________________                                        Plasmids used in Detailed Example I                                           Phage  Contents                                                               ______________________________________                                        LG1                                                                                   ##STR90##                                                                    adaptor                                                                pLG2                                                                                  ##STR91##                                                                     ##STR92##                                                             pLG3                                                                                  ##STR93##                                                             pLG4                                                                                  ##STR94##                                                                     ##STR95##                                                             pLG5                                                                                  ##STR96##                                                                     ##STR97##                                                             pLG6                                                                                  ##STR98##                                                                     ##STR99##                                                             pLG7                                                                                  ##STR100##                                                                    ##STR101##                                                            pLG8                                                                                  ##STR102##                                                            pLG9   pLG7 mutated to display BPTI (V15.sub.BPTI)                            pLG10                                                                                 ##STR103##                                                            pLG11                                                                                 ##STR104##                                                            ______________________________________                                    

                  TABLE 18                                                        ______________________________________                                        Enzyme sites eliminated when                                                   ##STR105##                                                                   ______________________________________                                         ##STR106##                                                                                ##STR107##                                                                               ##STR108##                                                                               ##STR109##                                  ##STR110##                                                                                ##STR111##                                                                               ##STR112##                                                                               ##STR113##                                  ##STR114##                                                                                ##STR115##                                                                               ##STR116##                                                                               ##STR117##                                  ##STR118##                                                                                ##STR119##                                                                               ##STR120##                                                                               ##STR121##                                  ##STR122##                                                                                ##STR123##                                                                               ##STR124##                                                                               ##STR125##                                  ##STR126##                                                                   ______________________________________                                    

                  TABLE 19                                                        ______________________________________                                        Enzymes not cutting M13mp18                                                   ______________________________________                                         ##STR127##                                                                               ##STR128##                                                                               ##STR129##                                                                                ##STR130##                                  ##STR131##                                                                               ##STR132##                                                                               ##STR133##                                                                                ##STR134##                                  ##STR135##                                                                               ##STR136##                                                                               ##STR137##                                                                                ##STR138##                                  ##STR139##                                                                               ##STR140##                                                                               ##STR141##                                                                                ##STR142##                                  ##STR143##                                                                               ##STR144##                                                                               ##STR145##                                                                                ##STR146##                                  ##STR147##                                                                               ##STR148##                                                                               ##STR149##                                                                                ##STR150##                                  ##STR151##                                                                               ##STR152##                                                                               ##STR153##                                                                                ##STR154##                                  ##STR155##                                                                               ##STR156##                                                                               ##STR157##                                                                                ##STR158##                                  ##STR159##                                                                               ##STR160##                                                                               ##STR161##                                                                                ##STR162##                                  ##STR163##                                                                               ##STR164##                                                        ______________________________________                                    

                  TABLE 20                                                        ______________________________________                                         ##STR165##                                                                   ______________________________________                                         ##STR166##                                                                                ##STR167##                                                                               ##STR168##                                                                               ##STR169##                                  ##STR170##                                                                                ##STR171##                                                                               ##STR172##                                                                               ##STR173##                                  ##STR174##                                                                                ##STR175##                                                                               ##STR176##                                                                               ##STR177##                                  ##STR178##                                                                                ##STR179##                                                                               ##STR180##                                                                               ##STR181##                                  ##STR182##                                                                   ______________________________________                                    

                                      TABLE 21                                    __________________________________________________________________________    Enzymes tested on Ambig DNA                                                   Enzyme                                                                              Recognition Symm cuts                                                                           Supply                                                __________________________________________________________________________     ##STR183##                                                                         GTMKAC      P 2 & 4                                                                             <B,M,I,N,P,T                                           ##STR184##                                                                         CTTAAG      P 1 & 5                                                                             <N                                                     ##STR185##                                                                         GGGCCC      P 5 & 1                                                                             <M,I,N,P,T                                             ##STR186##                                                                         TTCGAA      P 2 & 4                                                                              ##STR187##                                            ##STR188##                                                                         ATGCAT      P 5 & 1                                                                              ##STR189##                                                                    ##STR190##                                            ##STR191##                                                                         CCTAGG      P 1 & 5                                                                             <N                                                     ##STR192##                                                                         GGATCC      P 1 & 5                                                                             <S,B,M,I,N,P,T                                         ##STR193##                                                                         TGATCA      P 1 & 5                                                                             <S,B,M,I,N,T                                           ##STR194##                                                                         TCCGGA      P 1 & 5                                                                             <N                                                     ##STR195##                                                                         GCGCGC      P 1 & 5                                                                             <N,T                                                   ##STR196##                                                                         GGTNACC     P 1 & 6                                                                             <S,B,M,N,T                                             ##STR197##                                                                         CCANNNNN    P 8 & 4                                                                             <N,P,T                                                 ##STR198##                                                                         RGGNCCY     P 2 & 5                                                                              ##STR199##                                            ##STR200##                                                                         CCTNNNNN    P 5 & 6                                                                             <N(soon)                                               ##STR201##                                                                         GAATTC      P 1 & 5                                                                             <S,B,M,I,N,P,T                                         ##STR202##                                                                         GATATC      P 3 & 3                                                                             <S,B,M,I,N,P,T                                         ##STR203##                                                                         GCTNAGC     P 2 & 5                                                                             <T                                                     ##STR204##                                                                         AAGCTT      P 1 & 5                                                                             <S,B,M,I,N,P,T                                         ##STR205##                                                                         GTTAAC      P 3 & 3                                                                             <S,B,M,I,N,P ,T                                        ##STR206##                                                                         GTTACC      P 5 & 1                                                                             <S,B,M,I,N,P,T ;                                                               ##STR207##                                            ##STR208##                                                                         ACGCGT      P 1 & 5                                                                             <M,N,P,T                                               ##STR209##                                                                         GGCGCC      P 2 & 4                                                                             <B,N,T                                                 ##STR210##                                                                         CCATGG      P 1 & 5                                                                             <B,M,N,P,T                                             ##STR211##                                                                         GCTAGC      P 1 & 5                                                                             <M,N,P,T                                               ##STR212##                                                                         GCGGCCGC    P 2 & 6                                                                             <M,N,P,T                                               ##STR213##                                                                         TCGCGA      P 3 & 3                                                                             <B,M,N,T                                               ##STR214##                                                                         CCANNNNN    P 7 & 4                                                                             <N                                                     ##STR215##                                                                         CACGTG      P 3 & 3                                                                             <none                                                  ##STR216##                                                                         RGGWCCY     P 2 & 5                                                                             <N                                                     ##STR217##                                                                         CGGWCCG     P 2 & 5                                                                             <N,T                                                   ##STR218##                                                                         GAGCTC      P 5 & 1                                                                              ##STR219##                                            ##STR220##                                                                         GTCGAC      P 1 & 5                                                                             <B,M,I,N,P,T                                           ##STR221##                                                                         CCTNAGG     P 2 & 5                                                                              ##STR222##                                                                    ##STR223##                                            ##STR224##                                                                         GGCCNNNNNGGCC                                                                             P 8 & 5                                                                             <N,P,T                                                 ##STR225##                                                                         CCCGGG      P 3 & 3                                                                             <B,M,I,N,P,T                                           ##STR226##                                                                         ACTAGT      P 1 & 5                                                                             <M,N,T                                                 ##STR227##                                                                         GCATGC      P 5 & 1                                                                             <B,M,I,N,P,T                                           ##STR228##                                                                         AGGCCT      P 3 & 3                                                                              ##STR229##                                            ##STR230##                                                                         CCWWGG      P 1 & 5                                                                             <N,P,T                                                 ##STR231##                                                                         GTATAC      P 3 & 3                                                                             <N(soon)                                               ##STR232##                                                                         CTCGAG      P 1 & 5                                                                              ##STR233##                                                                   T                                                                              ##STR234##                                            ##STR235##                                                                         CCCGGG      P 1 & 5                                                                             <I,N,P,T                                               ##STR236##                                                                         CGGCCG      P 1 & 5                                                                              ##STR237##                                                                    ##STR238##                                           N.sub.-- restrct = 43                                                         __________________________________________________________________________

                                      TABLE 22                                    __________________________________________________________________________     ##STR239##                                                                   __________________________________________________________________________    pbd mod10 29III88 :                                                            ##STR240##                                                                    ##STR241##                                                                    ##STR242##                                                                    ##STR243##                                                                    ##STR244##                                                                   atgaagaaatctctggttcttaaggctagc!10, M13 leader                                 gttgctgtcgcgaccctggtaccgatg ctg!20                                            tcttttgctcgtccggatttctgtctcgag!30                                             ccgccatatactgggccctgcaaagcgcgc!40                                             atcatccgttatttctacaacgctaaagca!50                                             ggcctgtgccagacctttgtatacggtggt!60                                             tgccgtgctaagcgtaacaactttaaatcg!70                                             gccgaagattgcatgcgtacctgcggtggc !80                                            gccgctgaaggtgatgatccggccaaagcg!90                                             gcctttaactctctgcaagcttctgctacc!100                                            gaatatatcggttacgcgtgggccatggtg!110                                            gtggttatcgttggtgctaccatcggtatc!120                                            aaactgtttaagaaatttacttcgaaagcg!130                                             ##STR245##                                                                   agtctaagcccgcctaatgagcgggcttttttttt!terminator                                 ##STR246##                                                                   __________________________________________________________________________

                                      TABLE 23                                    __________________________________________________________________________     ##STR247##                                                                   __________________________________________________________________________     ##STR248##                                                                   DNA Sequence title =                                                          pbd mod10 29III88 : lac-UV5 RsrII/AvrII/gene/TrpA                             attenuator/MstII; !                                                            1 C| GGA| CCG| TAT| CCA|        GGC| TTT| ACA| CTT| TAT|         GCT| TCC| GGC| TCG|                        41 TAT| AAT| GTG| TGG| AAT|     GT| GAG| CGG| ATA| ACA|          ATT| CCT| AGG| AGG|                        83 CTC| ACT | ATG| AAG| AAA|     TCT| CTG| GTT| CTT| AAG|        GCT| AGC| GTT| GCT|                       125 GTC| GCG| ACC| CTG| GTA|     CG| ATG| CTG| TCT| TTT|          GCT| CGT| CCG| GAT|                       167 TTC| TGT| CTC| GAG| CCG|     CA| TAT| ACT| GGG| CCC|          TGC| AAA| GCG | CGC|                      209 ATC| ATC| CGT| TAT| TTC|     AC| AAC| GCT| AAA| GCA|          GGC| CTG| TGC| CAG|                       251 ACC| TTT| GTA| TAC| GGT|     GT| TGC| CGT| GCT| AAG|          CGT| AAC| AAC| TTT|                       293 AAA| TCG| GCC| GAA| GAT|     GC| ATG| CGT| ACC| TGC |         GGT| GGC| GCC| GCT|                       335 GAA| GGT| GAT| GAT| CCG|     CC| AAA| GCG| GCC| TTT|          AAC| TCT| CTG| CAA|                       377 GTC| TCT| GCT| ACC| GAA|     AT| ATC| GGT| TAC| GCG|          TGG| GCC| ATG| GTG|                       419 GTG| GTT| ATC| GTT| GGT|     CT|  ACC| ATC| GGT| ATC|         AAA| CTG| TTT| AAG|                       461 AAA| TTT| ACT| TCG| AAA|     CG| TCT| TAA| TAG| TGA|          GGT| TAC| CAG| TCT|                       503 AAG| CCC| GCC| TAA| TGA|     CG| GGC| TTT| TTT| TTT|          CCT| GAG| G                                                 Total = 539 bases                                                             __________________________________________________________________________

                  TABLE 24                                                        ______________________________________                                        Summary of Restriction Cuts                                                   ______________________________________                                         ##STR249##                                                                    ##STR250##                                                                    ##STR251##                                                                    ##STR252##                                                                    ##STR253##                                                                    ##STR254##                                                                    ##STR255##                                                                    ##STR256##                                                                    ##STR257##                                                                    ##STR258##                                                                    ##STR259##                                                                    ##STR260##                                                                    ##STR261##                                                                    ##STR262##                                                                    ##STR263##                                                                    ##STR264##                                                                    ##STR265##                                                                    ##STR266##                                                                    ##STR267##                                                                    ##STR268##                                                                    ##STR269##                                                                    ##STR270##                                                                    ##STR271##                                                                    ##STR272##                                                                    ##STR273##                                                                    ##STR274##                                                                    ##STR275##                                                                    ##STR276##                                                                    ##STR277##                                                                    ##STR278##                                                                    ##STR279##                                                                    ##STR280##                                                                    ##STR281##                                                                    ##STR282##                                                                    ##STR283##                                                                    ##STR284##                                                                    ##STR285##                                                                    ##STR286##                                                                    ##STR287##                                                                    ##STR288##                                                                    ##STR289##                                                                    ##STR290##                                                                    ##STR291##                                                                    ##STR292##                                                                    ##STR293##                                                                    ##STR294##                                                                    ##STR295##                                                                    ##STR296##                                                                    ##STR297##                                                                    ##STR298##                                                                    ##STR299##                                                                    ##STR300##                                                                   Enzymes that do not cut                                                        ##STR301##                                                                    ##STR302##                                                                    ##STR303##                                                                    ##STR304##                                                                    ##STR305##                                                                    ##STR306##                                                                    ##STR307##                                                                    ##STR308##                                                                    ##STR309##                                                                    ##STR310##                                                                   ______________________________________                                    

                                      TABLE 25                                    __________________________________________________________________________     ##STR311##                                                                   __________________________________________________________________________     ##STR312##                            28                                      ##STR313##                            52                                      ##STR314##                            73                                      ##STR315##                            88                                      ##STR316##                            118                                     ##STR317##                            148                                     ##STR318##                            178                                     ##STR319##                            208                                     ##STR320##                            235                                     ##STR321##                            268                                     ##STR322##                            295                                     ##STR323##                            325                                     ##STR324##                            346                                     ##STR325##                            361                                     ##STR326##                            388                                     ##STR327##                            409                                     ##STR328##                            424                                     ##STR329##                            448                                     ##STR330##                            478                                     ##STR331##                            502                                     ##STR332##                            532                                     ##STR333##                            539                                    Note the following enzyme equivalences,                                        ##STR334##                                                                    ##STR335##                                                                    ##STR336##                                                                    ##STR337##                                                                    ##STR338##                                                                   __________________________________________________________________________

                                      TABLE 26                                    __________________________________________________________________________    DNA.sub.-- seq1                                                               __________________________________________________________________________     ##STR339##                                                                    ##STR340##                                                                    ##STR341##                                                                    ##STR342##                                                                    ##STR343##                                                                    ##STR344##                                                                    ##STR345##                                                                    ##STR346##                                                                   __________________________________________________________________________

                                      TABLE 27                                    __________________________________________________________________________    DNA.sub.-- synth1                                                             __________________________________________________________________________     ##STR347##                                                                    ##STR348##                                                                    ##STR349##                                                                    ##STR350##                                                                    ##STR351##                                                                    ##STR352##                                                                    ##STR353##                                                                    ##STR354##                                                                    ##STR355##                                                                   __________________________________________________________________________

                                      TABLE 28                                    __________________________________________________________________________    DNA.sub.-- seq2                                                               __________________________________________________________________________     ##STR356##                                                                    ##STR357##                                                                    ##STR358##                                                                    ##STR359##                                                                    ##STR360##                                                                    ##STR361##                                                                    ##STR362##                                                                    ##STR363##                                                                   __________________________________________________________________________

                                      TABLE 29                                    __________________________________________________________________________    DNA.sub.-- synth2                                                             __________________________________________________________________________     ##STR364##                                                                    ##STR365##                                                                    ##STR366##                                                                    ##STR367##                                                                    ##STR368##                                                                    ##STR369##                                                                    ##STR370##                                                                    ##STR371##                                                                    ##STR372##                                                                   __________________________________________________________________________

                                      TABLE 30                                    __________________________________________________________________________    DNA.sub.-- seq3                                                               __________________________________________________________________________     ##STR373##                                                                    ##STR374##                                                                    ##STR375##                                                                    ##STR376##                                                                    ##STR377##                                                                    ##STR378##                                                                    ##STR379##                                                                   __________________________________________________________________________

                                      TABLE 31                                    __________________________________________________________________________    DNA.sub.-- synth3                                                             __________________________________________________________________________     ##STR380##                                                                    ##STR381##                                                                    ##STR382##                                                                    ##STR383##                                                                    ##STR384##                                                                    ##STR385##                                                                    ##STR386##                                                                    ##STR387##                                                                   __________________________________________________________________________

                                      TABLE 32                                    __________________________________________________________________________    DNA.sub.-- seq4                                                               __________________________________________________________________________     ##STR388##                                                                    ##STR389##                                                                    ##STR390##                                                                    ##STR391##                                                                    ##STR392##                                                                    ##STR393##                                                                    ##STR394##                                                                   __________________________________________________________________________

                                      TABLE 33                                    __________________________________________________________________________    DNA.sub.-- synth4                                                             __________________________________________________________________________     ##STR395##                                                                    ##STR396##                                                                    ##STR397##                                                                    ##STR398##                                                                    ##STR399##                                                                    ##STR400##                                                                    ##STR401##                                                                    ##STR402##                                                                   __________________________________________________________________________

                                      TABLE 34                                    __________________________________________________________________________    Some interaction sets in BPTI                                                    Number                                                                     Res.                                                                             Diff.                                                                      #  AAs  Contents          BPTI                                                                              1 2 3 4 5                                       __________________________________________________________________________    -5 2    D -32             --                                                  -4 2    E -32             --                                                  -3 5    T P F Z -29       --                                                  -2 10   Z3 R3 Q2 T2 H G L K E -18                                                                       --                                                  -1 10   D4 T2 P2 Q2 E G N K R -18                                                                       --                                                  1  10   R21 A2 K2 H2 P L I T G D                                                                        R           5                                       2  9    P20 R4 A2 H2 N E V F L                                                                          P         s 5                                       3  10   D15 K6 T3 R2 P2 S Y G A L                                                                       D         4 s                                       4  7    F19 D4 L3 Y2 I2 A2 S                                                                            F         s 5                                       5  1    C33               C         x x                                       6  10   L11 E5 N4 K3 Q2 I2 Y2 D2 T R                                                                    L         4                                         7  5    L18 E11 K2 S Q    E       s 4                                         8  7    P26 H2 A2 I L G F P       3 4                                         9  9    P17 A6 V3 R2 Q L K Y F                                                                          P     s 3 4                                         10 10   Y11 E7 D4 A2 N2 R2 V2 S I D                                                                     Y   s   s 4                                         11 10   T17 P5 A3 R2 I S Q Y V K                                                                        T   1 s 3 4                                         12 2    G32 K             G   x   x x                                         13 5    P22 R6 L3 N I     P   1   s 4 s                                       14 3    C31 T A           C   1   s s 5                                       15 12   K15 R4 Y2 M2 L2 -2 V G A I N F                                                                  K   1 s 3 4 s                                       16 7    A22 G5 Q2 R K D F A   1 s s s 5                                       17 12   R12 K5 A2 Y3 H2 S2 F2 L M T G P                                                                 R   1 2 3   s                                       18 6    I21 M4 F3 L2 V2 T I   1 s s   5                                       19 7    I11 P10 R6 S2 K2 L Q                                                                            I   1 2 3   s                                       20 5    R19 A7 S4 L2 Q    R   s s s   5                                       21 4    Y18 F13 W I       Y     2 s s s                                       22 6    F13 Y14 H2 A N S  F     s 3 4                                         23 2    Y32 F             Y       s s                                         24 4    N26 K3 D3 S       N     s 3                                           25 10   A12 S5 Q3 P3 W3 L2 T2 K G R                                                                     A       s s                                         26 9    K16 A6 T2 E2 S2 R2 G H V                                                                        K     s 3 4                                         27 5    A18 S8 K3 L2 T2   A     2 3 4                                         28 7    G13 K10 N5 Q2 R H M                                                                             G     2 s s                                         29 10   L9 Q7 K7 A2 F2 R2 M G T N                                                                       L     2 3                                           30 1    C33               C     x x x                                         31 7    Q12 E11 L4 K2 V2 Y N                                                                            Q     2 3 4                                         32 11   T12 P5 K4 Q3 E2 L2 G V S R A                                                                    T     2 3 s                                         33 1    F33               F   x x x x                                         34 11   V11 I8 T3 D2 N2 Q2 F H P R K                                                                    V   1 2 3 s                                         35 2    Y31 W2            Y   s s s   5                                       36 3    G27 S5 R          G   1                                               37 1    G33               G   x       x                                       38 3    C31 T A           C   1     s 5                                       39 7    R13 G9 K4 Q3 D2 P M                                                                             R   1     4 s                                       40 2    G22 A11           A   s     s 5                                       41 3    N20 K11 D2        K         4 s                                       42 9    A11 R9 S4 G3 H2 D Q K N                                                                         R         s 5                                       43 2    N31 G2            N           s                                       44 3    N21 R11 K         N           s                                       45 2    F32 Y             F           s                                       46 8    K24 E2 S2 D H V Y R                                                                             K           5                                       47 2    T19 S14           S     s     5                                       48 9    A11 I9 E4 T2 W2 L2 R K D                                                                        A     2 s   s                                       49 7    E19 D6 A2 Q2 K2 T H                                                                             E     2     s                                       50 6    E16 D12 L2 M Q K  D     s     5                                       51 1    C33               C     x     x                                       52 7    R13 M10 L3 E3 Q2 H V                                                                            M     2     s                                       53 8    R21 Q3 E2 H2 C2 G K D                                                                           R     s     5                                       54 7    T23 A3 V2 E2 I Y K                                                                              T           5                                       55 1    C33               C           x                                       56 8    G15 V8 I3 E2 R2 A L S                                                                           G                                                   57 8    G19 V4 A3 P2 -2 R L N                                                                           G                                                   58 8    A11 -10 P3 K3 S2 Y2 R F                                                                         A                                                   59 9    -24 G2 Q E A Y S P R                                                                            --                                                  60 6    -28 Q R I G D     --                                                  61 3    -31 T P           --                                                  62 2    -32 D             --                                                  63 2    -32 K             --                                                  64 2    -32 S             --                                                  __________________________________________________________________________     s indicates secondary set                                                     x indicates in or close to surface but buried and/or highly conserved.   

                  TABLE 35                                                        ______________________________________                                        Distances from C.sub.β  to                                               Tip of Side Group                                                             in Å                                                                      Amino Acid type  Distance                                                     ______________________________________                                        A                0.0                                                          C (reduced)      1.8                                                          D                2.4                                                          E                3.5                                                          F                4.3                                                          G                --                                                           H                4.0                                                          I                2.5                                                          K                5.1                                                          L                2.6                                                          M                3.8                                                          N                2.4                                                          P                2.4                                                          Q                3.5                                                          R                6.0                                                          S                1.5                                                          T                1.5                                                          V                1.5                                                          W                5.3                                                          Y                5.7                                                          ______________________________________                                         Notes:                                                                        These distances were calculated for standard model parts with all side        groups fully extended.                                                   

                                      TABLE 36                                    __________________________________________________________________________    Distances, BPTI residue set #2                                                Distances in Å between C.sub.β.                                      Hypothetical C.sub.β was aded to each Glycine.                           __________________________________________________________________________        R17                                                                              I19 Y21                                                                              A27                                                                              G28                                                                              L29 Q31                                                                              T32                                                                              V34                                                                              A48                                          __________________________________________________________________________    I19 7.7                                                                       Y21 15.1                                                                             8.4                                                                    A27 22.6                                                                             17.1                                                                              12.2                                                               G28 26.6                                                                             20.4                                                                              13.8                                                                             5.3                                                             L29 22.5                                                                             15.8                                                                              9.6                                                                              5.1                                                                              5.2                                                          Q31 16.1                                                                             10.4                                                                              6.8                                                                              6.8                                                                              10.6                                                                             6.8                                                       T32 11.7                                                                             5.2 6.1                                                                              12.0                                                                             15.5                                                                             10.9                                                                              5.4                                                   V34 5.6                                                                              6.5 11.6                                                                             17.6                                                                             21.7                                                                             18.0                                                                              11.4                                                                             8.2                                                A48 18.5                                                                             11.0                                                                              5.4                                                                              12.6                                                                             13.3                                                                             8.4 8.8                                                                              8.3                                                                              15.7                                            E49 22.0                                                                             14.7                                                                              8.9                                                                              16.9                                                                             16.1                                                                             12.2                                                                              13.9                                                                             13.3                                                                             19.8                                                                             5.5                                          M52 23.6                                                                             16.3                                                                              8.6                                                                              12.2                                                                             10.3                                                                             7.6 11.3                                                                             13.2                                                                             20.0                                                                             6.2                                          P9  14.0                                                                             11.3                                                                              9.0                                                                              12.2                                                                             15.4                                                                             13.3                                                                              7.9                                                                              9.2                                                                              8.7                                                                              13.9                                         T11 9.5                                                                              11.2                                                                              13.5                                                                             18.8                                                                             22.5                                                                             19.8                                                                              13.5                                                                             12.1                                                                             5.7                                                                              18.5                                         K15 7.9                                                                              14.6                                                                              20.1                                                                             27.4                                                                             31.3                                                                             27.9                                                                              21.4                                                                             18.1                                                                             10.3                                                                             24.6                                         A16 5.5                                                                              10.1                                                                              15.9                                                                             25.2                                                                             28.5                                                                             24.6                                                                              18.6                                                                             14.5                                                                             8.6                                                                              19.8                                         I18 6.1                                                                              6.0 11.2                                                                             21.3                                                                             24.4                                                                             20.2                                                                              14.7                                                                             10.4                                                                             7.0                                                                              15.0                                         R20 10.6                                                                             5.9 5.4                                                                              16.0                                                                             18.5                                                                             14.6                                                                              9.8                                                                              6.9                                                                              7.8                                                                              10.2                                         F22 15.6                                                                             10.9                                                                              5.6                                                                              10.5                                                                             12.8                                                                             10.3                                                                              6.2                                                                              8.1                                                                              10.8                                                                             10.3                                         N24 19.9                                                                             14.7                                                                              9.4                                                                              4.1                                                                              7.3                                                                              6.1 4.8                                                                              10.0                                                                             14.7                                                                             11.4                                         K26 24.4                                                                             20.1                                                                              15.2                                                                             5.4                                                                              7.7                                                                              9.8 10.1                                                                             15.3                                                                             19.0                                                                             17.0                                         C30 18.9                                                                             12.1                                                                              4.6                                                                              8.8                                                                              9.5                                                                              5.3 5.9                                                                              8.2                                                                              14.9                                                                             4.9                                          F33 10.8                                                                             7.4 7.7                                                                              12.6                                                                             16.4                                                                             13.0                                                                              6.6                                                                              5.6                                                                              5.5                                                                              12.2                                         Y35 8.4                                                                              7.4 9.4                                                                              18.4                                                                             21.4                                                                             17.9                                                                              12.2                                                                             9.5                                                                              5.8                                                                              14.4                                         S47 17.6                                                                             10.6                                                                              6.6                                                                              17.3                                                                             17.9                                                                             13.4                                                                              12.6                                                                             10.4                                                                             15.9                                                                             5.3                                          D50 20.0                                                                             13.6                                                                              7.2                                                                              17.2                                                                             16.8                                                                             13.5                                                                              13.5                                                                             12.9                                                                             17.6                                                                             7.6                                          C51 18.9                                                                             12.2                                                                              4.0                                                                              12.1                                                                             12.2                                                                             8.8 8.8                                                                              9.7                                                                              15.3                                                                             5.4                                          R53 25.4                                                                             18.6                                                                              11.0                                                                             17.2                                                                             15.0                                                                             13.0                                                                              15.7                                                                             16.7                                                                             22.3                                                                             9.7                                          R39 15.4                                                                             16.9                                                                              17.1                                                                             24.9                                                                             27.2                                                                             24.9                                                                              20.1                                                                             18.7                                                                             13.8                                                                             22.3                                         __________________________________________________________________________        E49                                                                              M52 P9 T11                                                                              K15                                                                              A16 I18                                                                              R20                                                                              F22                                                                              N24                                          __________________________________________________________________________    M52 6.1                                                                       P9  17.7 15.5                                                                 T11 22.1                                                                             21.5                                                                              7.2                                                                K15 27.5                                                                             28.7                                                                              16.4                                                                             9.5                                                             A16 22.2                                                                             24.2                                                                              14.9                                                                             9.8                                                                              6.2                                                          I18 17.4                                                                             19.5                                                                              12.2                                                                             9.5                                                                              10.4                                                                             4.9                                                       R20 13.0                                                                             13.8                                                                              8.0                                                                              9.4                                                                              14.9                                                                             10.6                                                                              6.2                                                   F22 13.8                                                                             11.4                                                                              4.1                                                                              10.6                                                                             19.1                                                                             16.3                                                                              12.7                                                                             6.9                                                N24 15.6                                                                             11.2                                                                              8.4                                                                              15.3                                                                             24.1                                                                             21.9                                                                              18.2                                                                             12.7                                                                             6.6                                             K26 20.9                                                                             15.7                                                                              12.1                                                                             18.6                                                                             27.9                                                                             26.6                                                                              23.3                                                                             18.1                                                                             11.6                                                                             5.9                                          C30 8.7                                                                              5.6 10.6                                                                             16.6                                                                             24.1                                                                             20.2                                                                              15.7                                                                             9.8                                                                              6.8                                                                              6.9                                          F33 16.5                                                                             15.4                                                                              4.2                                                                              7.1                                                                              15.0                                                                             12.8                                                                              9.6                                                                              6.1                                                                              5.6                                                                              9.3                                          Y35 17.2                                                                             17.8                                                                              7.8                                                                              5.8                                                                              11.0                                                                             7.6 4.9                                                                              4.3                                                                              8.8                                                                              14.8                                         S47 4.7                                                                              9.1 15.3                                                                             18.5                                                                             23.1                                                                             17.6                                                                              12.8                                                                             9.1                                                                              12.0                                                                             15.3                                         D50 5.5                                                                              7.7 14.7                                                                             18.6                                                                             24.2                                                                             19.2                                                                              14.7                                                                             9.9                                                                              11.0                                                                             14.7                                         C51 7.1                                                                              5.4 11.0                                                                             16.4                                                                             23.5                                                                             19.2                                                                              14.6                                                                             8.7                                                                              6.9                                                                              9.6                                          R53 6.3                                                                              5.6 17.9                                                                             23.1                                                                             29.6                                                                             24.8                                                                              20.3                                                                             15.0                                                                             13.8                                                                             15.5                                         R39 23.9                                                                             24.0                                                                              13.0                                                                             9.5                                                                              12.0                                                                             11.8                                                                              12.5                                                                             12.8                                                                             14.7                                                                             20.8                                         __________________________________________________________________________        K26                                                                              C30 F33                                                                              Y35                                                                              S47                                                                              D50 C51                                                                              R53                                                __________________________________________________________________________    C30 12.4                                                                      F33 13.9                                                                             10.1                                                                   Y35 19.5                                                                             13.5                                                                              6.4                                                                S47 21.0                                                                             8.8 13.5                                                                             13.2                                                            D50 20.1                                                                             8.6 14.3                                                                             13.7                                                                             5.0                                                          C51 15.0                                                                             3.7 10.9                                                                             12.5                                                                             6.9                                                                              5.2                                                       R53 19.9                                                                             9.9 18.2                                                                             18.8                                                                             9.4                                                                              5.8 7.4                                                   R39 24.3                                                                             20.6                                                                              14.4                                                                             9.6                                                                              20.4                                                                             19.0                                                                              18.8                                                                             23.4                                               __________________________________________________________________________

                                      TABLE 37                                    __________________________________________________________________________    vgDNA to vary BPTI set #2.1                                                   __________________________________________________________________________     ##STR403##                                                                    ##STR404##                                                                    ##STR405##                                                                    ##STR406##                                                                   Overlap = 12 (7 CG, 5 AT)                                                      ##STR407##                                                                    ##STR408##                                                                   k = equal parts of T and G; m = equal parts of C and A;                       q = (.26 T, .18 C, .26 A, and .30 G);                                         f = (.22 T, .16 C, .40 A, and .22 G);                                         * = complement of symbol above                                                 ##STR409##                                                                   Parent = 1/(5.5 × 10.sup.7)least favored = 1/(4.2                       × 10.sup.9)                                                             Least favored one-amino-acid substitution from PPBD present at 1 in 1.6       × 10.sup.7                                                              __________________________________________________________________________

                                      TABLE 38                                    __________________________________________________________________________    Result of varying set# 2 of BPTI 2.1                                          __________________________________________________________________________     ##STR410##                                                                    ##STR411##                                                                    ##STR412##                                                                    ##STR413##                                                                    ##STR414##                                                                    ##STR415##                                                                    ##STR416##                                                                   __________________________________________________________________________

                                      TABLE 39                                    __________________________________________________________________________    vgDNA to vary set#2 BPTI 2.2                                                  __________________________________________________________________________     ##STR417##                                                                    ##STR418##                                                                    ##STR419##                                                                   Overlap = 15 (11 CG, 4 AT)                                                     ##STR420##                                                                    ##STR421##                                                                    ##STR422##                                                                   q =  (.26 T, .18 C, .26 A, and .30 G);                                        f = (.22 T, .16 C, .40 A, and .22 G);                                         * = complement of symbol above                                                 ##STR423##                                                                   Parent = 1/(4.4 × 10.sup.7)least favored = 1/(1.25                      × 10.sup.9)                                                             Least favored one-amino-acid substitution from PPBD present at 1 in 1.2       × 10.sup.7                                                              __________________________________________________________________________

                                      TABLE 40                                    __________________________________________________________________________    Result of varying set# 2 of BPTI 2.2                                          __________________________________________________________________________     ##STR424##                                                                    ##STR425##                                                                    ##STR426##                                                                    ##STR427##                                                                    ##STR428##                                                                    ##STR429##                                                                    ##STR430##                                                                   __________________________________________________________________________

                                      TABLE 4                                     __________________________________________________________________________    vg DNA set#2 of BPTI 2.3                                                      __________________________________________________________________________     ##STR431##                                                                    ##STR432##                                                                    ##STR433##                                                                   Overlap = 13 (7 CG, 6 AT)                                                      ##STR434##                                                                    ##STR435##                                                                   k = equal parts of T and G; m = equal parts of C and A;                       w = equal parts of A and T; n = equal parts of A, C, G, T;                    d = equal parts A, G, T;v = equal parts A, C, G;                              q = (.26 T, .18 C, .26 A, and .30 G);                                         f = (.22 T, .16 C, .40 A, and .22 G);                                         * = complement of symbol above                                                 ##STR436##                                                                    ##STR437##                                                                   parent = 1/(1 × 10.sup.7)  least favored = 1/(4 × 10.sup.8)       Least favored one-amino-acid substitution from PPBD present at 1 in 3         × 10.sup.7                                                              __________________________________________________________________________

                                      TABLE 42                                    __________________________________________________________________________    Result of varying set#2 of BPTI 2.3                                           __________________________________________________________________________     ##STR438##                                                                    ##STR439##                                                                    ##STR440##                                                                    ##STR441##                                                                    ##STR442##                                                                    ##STR443##                                                                    ##STR444##                                                                   __________________________________________________________________________

                                      TABLE 50                                    __________________________________________________________________________             Number                                                               IPBD     Amino Acids                                                                          Structure                                                                             Cross Links                                                                          Secreted                                                                           Source Organism                                                                        AfM                              __________________________________________________________________________    Preferred IPBDs                                                               Aprotinin                                                                              58     X-ray, NMR                                                                            3 SS   yes  Bos taurus                                                                             trypsin                                                  5-55, 14-38                                                                   30-51                                                                         (1:6, 2:4, 3:5)                                       Crambin  46     X-ray, NMR                                                                            3 SS   yes  rape seed ?,                                                                           Mab                              CMTI-III 26     NMR     3 SS   yes  cucumbertrypsin                           ST-I.sub.A                                                                             13     NMR     3 SS   yes  E. coliMabs &                                                                 guanylate cyclase                         Third domain,                                                                          56     X-ray, NMR                                                                            3 SS   yes  Coturnix trypsin                          ovomucoid                           coturnix                                                                      japonica                                  Ribonuclease A                                                                         124    X-ray, NMR     yes  Bos taurus                                                                             RNA, DNA                         Ribonuclease                                                                           104      X-ray, NMR?  yes  A. oruzae                                                                              RNA, DNA                         Lysozyme 129      X-ray, NMR?                                                                         4 SS   yes  Gallus gallus                                                                          NAG-NAM-NAG                      Azurin   128    X-ray   Cu:Cys,     P. aerugenosa                                                                          Mab                                                      HIS.sup.2, MET                                        Characteristics of Known IPBDs                                                α-Conotoxins                                                                      13-15 NMR     2 SS   yes  Conus snails                                                                           Receptor                         μ-Conotoxins                                                                        20-25  NMR     3 SS   yes  Conus snails                                                                           Receptor                         Ω-Conotoxins                                                                     25-30  --      3 SS   yes  Conus snails                                                                           Receptor                         King-kong                                                                              25-30  --      3 SS   yes  Conus snails                                                                           Mabs                             peptides                                                                      Nuclease 141    X-ray   none   yes  S aurius RNA, DNA                         (staphylococcal)                                                              Charybdotoxin                                                                          37     NMR     3 SS   yes  Leiurus  Ca.sup.+2 -dependent             (scorpion toxin)        7-28, 13-33 quinquestriatus                                                                        K.sup.+  channel                                         17-35       hebraeus                                                          (1:4, 2:5, 3:6)                                       Apamin   12     NMR     2 SS   yes  Bees     Mabs,                            (bee venom)             (1:3, 2:4)           Receptor(?)                      Other suitable IPBDs                                                                    Ferredoxin                                                                    Secretory trypsin inhibitor                                                   Soybean trypsin inhibitor                                                     SLPI (Secretory Leukocyte Protease Inhibitor) (THOM86) and SPAI               (ARAK90)                                                                      Cystatin and homologues (MACH89, STUB90)                                      Eglin (MCPH85)                                                                Barley inhibitor (CLOR87a, CLOR87b, SVEN82)                         __________________________________________________________________________

                                      TABLE 101a                                  __________________________________________________________________________     ##STR445##                                                                   pbd mod14: 9 V 89 : Sequence cloned into pGEM-MB1                              ##STR446##                                                                    ##STR447##                                                                   __________________________________________________________________________     ##STR448##                                                                    ##STR449##                                                                    ##STR450##                                                                    ##STR451##                                                                   atg aag aaa tct ctg gtt ctt aag gct agc                                                                    ! 10, M13 leader                                 gtt gct gtc gcg acc ctg gta cct atg ttg                                                                    ! 20 <- codon #                                  tcc ttc gct cgt ccg gat ttc tgt ctc gag                                                                    ! 30                                             cca cca tac act ggg ccc tgc aaa gcg cgc                                                                    ! 40                                             atc atc cgC tat ttc tac aat gct aaa gca                                                                    ! 50                                             ggc ctg tgc cag acc ttt gta tac ggt ggt                                                                    ! 60                                             tgc cgt gct aag cgt aac aac ttt aaa tcg                                                                    ! 70                                             gcc gaa gat tgc atg cgt acc tgc ggt ggc                                                                    ! 80                                             gcc gct gaa ggt gat ccg gcc aaG gcg                                                                        ! 90                                             gcc ttc aat tct ctG caa gct tct gct acc                                                                    ! 100                                            gag tat att ggt tac gcg tgg gcc atg gtg                                                                    ! 110                                            gtg gtt atc gtt ggt gct acc atc ggg atc                                                                    ! 120                                            aaa ctg ttc aag aag ttt act tcg aag gcg                                                                    ! 130                                             ##STR452##                                                                   AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT ! terminator                           ##STR453##                                                                   (GACCTGCAGGCATGCAAGCTT...-3') ! pGEM polylinker                               __________________________________________________________________________      Notes:                                                                       .sup.a Designed sequence contained AGGAGG, but sequencing indicates that      actual DNA contains AGAGG.                                               

                                      TABLE 101b                                  __________________________________________________________________________     ##STR454##                                                                    ##STR455##                                                                   in PstI site of pGEM-MB1.                                                     __________________________________________________________________________     ##STR456##                                                                    ##STR457##                                                                    ##STR458##                                                                    ##STR459##                                                                    ##STR460##                                                                    ##STR461##                                                                   atgaagaaatctctggttcttaag gctagc!10, M13 leader                                gttgctgtcgcgaccctggtacctatgttg!20 <- codon #                                  tccttcgctcgtccggatttctgtctcgag!30                                             ccaccatacactgggccctgcaaagcgcgc!40                                             atcatccgCtatttctacaatgctaaagca!50                                             ggcctgtgccagacctttgtatacggtggt!60                                             tgccgtgctaagcgtaacaactttaaatcg!70                                             gccgaagattgcatgcgtacctgcggtggc!80                                             gccgctgaaggt gatgatccggccaaGgcg!90                                            gccttcaattctctGcaagcttctgctacc!100                                            gagtatattggttacgcgtgggccatggtg!110                                            gtggttatcgttggtgctaccatcgggatc!120                                            aaactgttcaagaagtttacttcgaaggcg!130                                             ##STR462##                                                                   ACTCTAAGCCCGCCTAATGAGCGGGCTTTTTTTTT!terminator                                aTCGAGACctgcaGGTCGACCggcatgc-3'                                                ##STR463##                                                                   __________________________________________________________________________

                                      TABLE 102a                                  __________________________________________________________________________    Annotated Sequence of gene found in pGEM-MB1                                  __________________________________________________________________________    nucleotide                                                                    number                                                                         ##STR464##                                                                    ##STR465##                                                                    ##STR466##                                                                    ##STR467##                                                                    ##STR468##                                                                    ##STR469##                                                                    ##STR470##                                                                    ##STR471##                                                                    ##STR472##                                                                    ##STR473##                                                                    ##STR474##                                                                    ##STR475##                                                                    ##STR476##                                                                    ##STR477##                                                                    ##STR478##                                                                    ##STR479##                                                                    ##STR480##                                                                    ##STR481##                                                                    ##STR482##                                                                    ##STR483##                                                                    ##STR484##                                                                   Notes:                                                                        .sup.a Designed called for Shine-Dalgarno sequence, AGGAGG,                   but sequencing shows that actual constructed gene contains AGAGG.             Note the following enzyme equivalences,                                        ##STR485##                                                                    ##STR486##                                                                   __________________________________________________________________________

                                      TABLE 102b                                  __________________________________________________________________________    Annotated Sequence of gene after insertion of SalI linker                     __________________________________________________________________________    nucleotide                                                                    number                                                                         ##STR487##                                                                    ##STR488##                                                                    ##STR489##                                                                    ##STR490##                                                                    ##STR491##                                                                    ##STR492##                                                                    ##STR493##                                                                    ##STR494##                                                                    ##STR495##                                                                    ##STR496##                                                                    ##STR497##                                                                    ##STR498##                                                                    ##STR499##                                                                    ##STR500##                                                                    ##STR501##                                                                    ##STR502##                                                                    ##STR503##                                                                    ##STR504##                                                                    ##STR505##                                                                    ##STR506##                                                                    ##STR507##                                                                    ##STR508##                                                                   Note the following enzyme equivalences,                                        ##STR509##                                                                    ##STR510##                                                                   __________________________________________________________________________

                  TABLE 102c                                                      ______________________________________                                        Calculated properties of Peptide                                              ______________________________________                                        For the apoprotein                                                            Molecular weight of peptide =                                                                        16192                                                  Charge on peptide =      9                                                    [A+G +P] =               36                                                   [C+F +H+I+L+M+V+W+Y] =   48                                                   [D+E +K+R+N+Q+S+T+.]     48                                                   For the mature protein                                                        Molecular weight of peptide =                                                                        133339                                                 Charge on peptide =      6                                                    [A+G+P] =                31                                                   [C+F+H+I+L+M+V+W+Y] =    37                                                   [D+E+K+R+N+Q+S+T+.] =    41                                                   ______________________________________                                    

                  TABLE 102d                                                      ______________________________________                                        Codon Usage                                                                   First  Second Base                                                            Base   t         c     a      g   Third base                                  ______________________________________                                        t      3         4     2      1   t                                                  5         1     4      5   c                                                  0         0     0      0   a                                                  1         2     0      1   g                                           c      1         1     0      4   t                                                  1         1     0      2   c                                                  0         2     1      0   a                                                  5         2     1      0   g                                           a      1         2     2      0   t                                                  5         5     2      1   c                                                  0         0     5      0   a                                                  4         0     7      0   g                                           g      4         9     4      6   t                                                  1         5     0      2   c                                                  2         1     2      0   a                                                  2         5     2      2   g                                           ______________________________________                                    

                  TABLE 102e                                                      ______________________________________                                        Amino-acid frequency                                                          AA     #        AA     #      AA   #     AA   #                               ______________________________________                                        Encoded polypeptide                                                           A      20       C      6      D    4     E    4                               F      8        G      10     H    0     I    6                               K      12       L      8      M    4     N    4                               P      6        Q      2      R    6     S    8                               T      7        V      9      W    1     Y    6                               .      1                                                                      Mature protein                                                                A      16       C      6      D    4     E    4                               F      7        G      10     H    0     I    6                               K      9        L      4      M    2     N    4                               P      5        Q      2      R    6     S    5                               T      6        V      5      W    1     Y    6                               ______________________________________                                    

                  TABLE 102f                                                      ______________________________________                                        Enzymes used to manipulate BPTI-gp8 fusion                                    ______________________________________                                         ##STR511##                                                                    ##STR512##                                                                    ##STR513##                                                                    ##STR514##                                                                    ##STR515##                                                                    ##STR516##                                                                    ##STR517##                                                                    ##STR518##                                                                    ##STR519##                                                                    ##STR520##                                                                    ##STR521##                                                                    ##STR522##                                                                    ##STR523##                                                                    ##STR524##                                                                    ##STR525##                                                                    ##STR526##                                                                    ##STR527##                                                                    ##STR528##                                                                    ##STR529##                                                                    ##STR530##                                                                    ##STR531##                                                                    ##STR532##                                                                    ##STR533##                                                                    ##STR534##                                                                    ##STR535##                                                                    ##STR536##                                                                    ##STR537##                                                                   ______________________________________                                    

                                      TABLE 103                                   __________________________________________________________________________     ##STR538##                                                                   Underscored bases indicate sites of overlap between                           annealed synthetic duplexes.                                                  __________________________________________________________________________    5'-                                                                            ##STR539##                                                                    ##STR540##                                                                    ##STR541##                                                                    ##STR542##                                                                    ##STR543##                                                                    ##STR544##                                                                    ##STR545##                                                                    ##STR546##                                                                    ##STR547##                                                                    ##STR548##                                                                    ##STR549##                                                                    ##STR550##                                                                    ##STR551##                                                                    ##STR552##                                                                    ##STR553##                                                                    ##STR554##                                                                    ##STR555##                                                                    ##STR556##                                                                    ##STR557##                                                                    ##STR558##                                                                   __________________________________________________________________________

                                      TABLE 104                                   __________________________________________________________________________    Definition and alignment of oligonucleotides                                  __________________________________________________________________________     ##STR559##                                                                    ##STR560##                                                                    ##STR561##                                                                    ##STR562##                                                                    ##STR563##                                                                    ##STR564##                                                                    ##STR565##                                                                    ##STR566##                                                                    ##STR567##                                                                    ##STR568##                                                                    ##STR569##                                                                    ##STR570##                                                                    ##STR571##                                                                   __________________________________________________________________________

                                      TABLE 105                                   __________________________________________________________________________    Individual sequences of Oligonucleotides 801-817.                             __________________________________________________________________________     ##STR572##                                                                    ##STR573##                                                                    ##STR574##                                                                    ##STR575##                                                                    ##STR576##                                                                    ##STR577##                                                                    ##STR578##                                                                    ##STR579##                                                                    ##STR580##                                                                    ##STR581##                                                                    ##STR582##                                                                    ##STR583##                                                                    ##STR584##                                                                    ##STR585##                                                                    ##STR586##                                                                    ##STR587##                                                                   __________________________________________________________________________

                                      TABLE 106                                   __________________________________________________________________________    Signal Peptides                                                               __________________________________________________________________________    PhoA                  .sub.--K                                                                       q s t i a l a l l  p                                   MalE    M   .sub.--K                                                                       l  .sub.--K                                                                       T G A  .sub.--T                                                                       i l a l s a l t  t                                   OmpF             M M  .sub.--K                                                                        .sub.--R                                                                       n i l a v i v p  a                                   Bla            M S i Q H F  .sub.--R                                                                       v a l i p f  f                                   LamB       M M I T L  .sub.--R                                                                        .sub.--K                                                                       l p l a v a v a  a                                   Lpp                  M  .sub.--K                                                                        .sub.--K                                                                       l l f a i p                                        gpIII                    M  .sub.-- K                                                                       .sub.--K                                                                       l l f a i  p                                   gpIII-BPTI               M  .sub.--K                                                                        .sub.--K                                                                       l l f a i  p                                   gpVIII       M  .sub.--K                                                                        .sub.--K                                                                       S L   V L  .sub.--K                                                                       a s v a v  a                                   gpVIII-BPTI  M  .sub.--K                                                                        .sub.--K                                                                       S L   V L  .sub.--K                                                                       a s v a v  a                                   gpVIII'      M  .sub.--K                                                                        .sub.--K                                                                       s l   v l l a s v a v  a                                   __________________________________________________________________________    PhoA    l l f t p v t  --K                                                                            A / --R                                                                           T . . . (17)                                      MalE    m m f s a s a l a / .sub.--K                                                                      I . . . (18)                                      OmpF    l l v a g t a n a /a                                                                               .sub.--E                                                                       . . . (19)                                      Bla     a a f c l p v f a /h                                                                              p . . . (>18)                                     LamB    g v m s a q a m a /v                                                                               .sub.--D                                                                       . . . (19)                                      Lpp     i l g s t l l a g /c                                                                              s . . . (>17)                                     gpIII   l v v p f y s h s /a                                                                               .sub.--E                                                                       T  V   .sub.--E                                                                         . . .                                                                            (16)                               gpIII-BPTI                                                                            l v v p f y s g a / .sub.--R                                                                      P  .sub.--D                                                                        . . .  (15)                                  gpVIII  t l v p m l s f a /a                                                                               .sub.--E                                                                       G   .sub.--D                                                                         .sub.--D                                                                         . . .                                                                            (16)                               gpVIII-BPTI                                                                           t l v p m l s f a / .sub.--R                                                                      P  .sub.--D                                                                        . . .  (15)                                  gpVIII' t l v p m l s f a /a                                                                               .sub.--E                                                                       G   .sub.--D                                                                         .sub.--D                                                                         . . .                                                                            (21)                               __________________________________________________________________________

                                      TABLE 107                                   __________________________________________________________________________    In vitro transcription/translation                                            analysis of vector-encoded                                                    signal::BPTI::mature VIII protein species                                               31 kd species.sup.a  14.5 kd species.sup.b                          __________________________________________________________________________    No DNA (control)                                                                        .sup. -.sup.c        -                                              pGEN-3Zf(-)                                                                             +                    -                                              pGEM-MB16 +                    -                                              pGEM-MB20 +                    +                                              pGEM-MB26 +                    +                                              pGEM-MB42 +                    +                                              pGEM-MB46 ND                   ND                                             __________________________________________________________________________     Notes:                                                                        .sup.a pre-beta-lactamase, encoded by the amp (bla) gene.                     .sup.b pre-BPTI/VIII peptides encoded by the synthetic gene and derived       constructs.                                                                   .sup.c - for absence of product; + for presence of product; ND for Not        Determined.                                                              

                                      TABLE 108                                   __________________________________________________________________________    Western analysis.sup.a of in vivo                                             expressed                                                                     signal::BPTI::mature VIII protein species                                                 signal                                                                            14.5 kd species.sup.b                                                                         12 kd species.sup.c                           __________________________________________________________________________    A) expression in strain XL1-Blue                                              pGEM-3Zf(-) -   .sup. -.sup.d   -                                             pGEM-MB16   VIII                                                                              -               -                                             pGEM-MB20   VIII                                                                              ++              -                                             pGEM-MB26   VIII                                                                              +++             +/-                                           pGEM-MB42   phoA                                                                              ++              +                                             B) expression in strain SEF'                                                  pGEM-MB42   phoA                                                                              +/-             +++                                           __________________________________________________________________________     Notes:                                                                        .sup.a Analysis using rabbit antiBPTI polyclonal antibodies and               horseradish-peroxidase-conjugated goat antirabbit IgG antibody.               .sup.b pro-BPTI/VIII peptides encoded by the synthetic gene and derived       constructs.                                                                   .sup.c processed BPTI/VIII peptide encoded by the synthetic gene.             .sup.d not present --                                                         weakly present +/--                                                           present +-                                                                    strong presence ++-                                                           very strong presence +++-                                                

                                      TABLE 109                                   __________________________________________________________________________    M13 gene III                                                                  __________________________________________________________________________    1579                                                                             5'-GT     GAAAAAATTA                                                                              TTATTCGCAA                                                                              TTCCTTTAGT                                   1611                                                                             TGTTCCTTTC                                                                              TATTCTCACT                                                                              CCGCTGAAAC                                                                              TGTTGAAAGT                                   1651                                                                             TGTTTAGCAA                                                                              AACCCCATAC                                                                              AGAAAATTCA                                                                              TTTACTAACG                                   1691                                                                             TCTGGAAAGA                                                                              CGACAAAACT                                                                              TTAGATCGTT                                                                              ACGCTAACTA                                   1731                                                                             TGAGGGTTGT                                                                              CTGTGGAATG                                                                              CTACAGGCGT                                                                              TGTAGTTTGT                                   1771                                                                             ACT GGTGACG                                                                             AAACTCAGTG                                                                              TTACGGTACA                                                                              TGGGTTCCTA                                   1811                                                                             TTGGGCTTGC                                                                              TATCCCTGAA                                                                              AATGAGGGTG                                                                              GTGGCTCTGA                                   1851                                                                             GGGTGGCGGT                                                                              TCTGAGGGTG                                                                              GCGGTTCTGA                                                                              GGGTGGCGGT                                   1891                                                                             ACTAAACCTC                                                                              CTGAGTACGG                                                                              TGATACACCT                                                                              ATTCCGGGCT                                   1931                                                                             ATACTTATAT                                                                              CAACCCTCTC                                                                              GACGGCACTT                                                                              ATCCGCCTGG                                   1971                                                                             TACTGAGCAA                                                                              AACCCCGCTA                                                                              ATCCTAATCC                                                                              TTCTCTTGAG                                   2011                                                                             GAGTCTCAGC                                                                              CTCTTAATAC                                                                              TTTCATGTTT                                                                              CAGAATAATA                                   2051                                                                             GGTTCCGAAA                                                                              TAGGCAGGGG                                                                              GCATTAACTG                                                                              TTTATACGGG                                   2091                                                                             CACTGTTACT                                                                              CAAGGCACTG                                                                              ACCCCGTTAA                                                                              AACTTATTAC                                   2131                                                                             CAGTACACTC                                                                              CTGTATCATC                                                                              AAAAGCCATG                                                                              TATGACGCTT                                   2171                                                                             ACTGGAACGG                                                                              TAAATTCAGA                                                                              GACTGCGCTT                                                                              TCCATTCTGG                                   2211                                                                             CTTTAATGAG                                                                              GATCCAT TCG                                                                             TTTGTGAATA                                                                              TCAAGGCCAA                                   2251                                                                             TCGTCTGACC                                                                              TGCCTCAACC                                                                              TCCTGTCAAT                                                                              GCTGGCGGCG                                   2291                                                                             GCTCTGGTGG                                                                              TGGTTCTGGT                                                                              GGCGGCTCTG                                                                              AGGGTGGTGG                                   2331                                                                             CTCTGAGGGT                                                                              GGCGGTTCTG                                                                              AGGGTGGCGG                                                                              CTCTGAGGGA                                   2371                                                                             GGCGGTTCCG                                                                              GTGGTGGCTC                                                                              TGGTTCCGGT                                                                              GATTTTGATT                                   2411                                                                             ATGAAAAGAT                                                                              GGCAAACGCT                                                                              AATAAGGGGG                                                                              CTATGACCGA                                   2451                                                                             AAATGCCGAT                                                                              GAAAACGCGC                                                                              TACAGTCTGA                                                                              CGCTAAA GGC                                  2491                                                                             AAACTTGATT                                                                              CTGTCGCTAC                                                                              TGATTACGGT                                                                              GCTGCTATCG                                   2531                                                                             ATGGTTTCAT                                                                              TGGTGACGTT                                                                              TCCGGCCTTG                                                                              CTAATGGTAA                                   2571                                                                             TGGTGCTACT                                                                              GGTGATTTTG                                                                              CTGGCTCTAA                                                                              TTCCCAAATG                                   2611                                                                             GCTCAAGTCG                                                                              GTGACGGTGA                                                                              TAATTCACCT                                                                              TTAATGAATA                                   2651                                                                             ATTTCCGTCA                                                                              ATATTTACCT                                                                              TCCCTCCCTC                                                                              AATCGGTTGA                                   2691                                                                             ATGTCGCCCT                                                                              TTTGTCT TTA                                                                             GCGCTGGTAA                                                                              ACCATATGAA                                   2731                                                                             TTTTCTATTG                                                                              ATTGTGACAA                                                                              AATAAACTTA                                                                              TTCCGTGGTG                                   2771                                                                             TCTTTGCGTT                                                                              TCTTTTATAT                                                                              GTTGCCACCT                                                                              TTATGTATGT                                   2811                                                                             ATTTTCTACG                                                                              TTTGCTAACA                                                                              TACTGCGTAA                                                                              TAAGGAGTCT                                   2851                                                                             TAATCATGCC                                                                              AGTTCTTTTG                                                                              GGTATTCCGT                                             __________________________________________________________________________

                                      TABLE 110                                   __________________________________________________________________________     ##STR588##                                                                   __________________________________________________________________________     ##STR589##                                                                    ##STR590##                                                                    ##STR591##                                                                    ##STR592##                                                                    ##STR593##                                                                    ##STR594##                                                                    ##STR595##                                                                    ##STR596##                                                                   __________________________________________________________________________

                                      TABLE 111                                   __________________________________________________________________________     ##STR597##                                                                   __________________________________________________________________________     ##STR598##                                                                    ##STR599##                                                                    ##STR600##                                                                    ##STR601##                                                                    ##STR602##                                                                    ##STR603##                                                                    ##STR604##                                                                    ##STR605##                                                                    ##STR606##                                                                    ##STR607##                                                                    ##STR608##                                                                    ##STR609##                                                                    ##STR610##                                                                    ##STR611##                                                                    ##STR612##                                                                    ##STR613##                                                                    ##STR614##                                                                    ##STR615##                                                                   __________________________________________________________________________

                                      TABLE 112                                   __________________________________________________________________________    Annotated Sequence of Ptac::RBS(GGAGGAAATAAA)::                               VIII-signal::mature-bpti::mature-VIII-coat-protein gene                       __________________________________________________________________________     ##STR616##                                                                    ##STR617##                                                                    ##STR618##                                                                    ##STR619##                                                                    ##STR620##                                                                    ##STR621##                                                                    ##STR622##                                                                    ##STR623##                                                                    ##STR624##                                                                    ##STR625##                                                                    ##STR626##                                                                    ##STR627##                                                                    ##STR628##                                                                    ##STR629##                                                                    ##STR630##                                                                    ##STR631##                                                                    ##STR632##                                                                    ##STR633##                                                                    ##STR634##                                                                    ##STR635##                                                                    ##STR636##                                                                   __________________________________________________________________________

                                      TABLE 113                                   __________________________________________________________________________    Annotated Sequence of pGEM-MB42 comprising Ptac::RBS(GGAGGAAATAAA)::          phoA-signal::mature-bpti::mature-VIII-coat-protein                            __________________________________________________________________________     ##STR637##                                                                    ##STR638##                                                                    ##STR639##                                                                    ##STR640##                                                                    ##STR641##                                                                    ##STR642##                                                                    ##STR643##                                                                    ##STR644##                                                                    ##STR645##                                                                    ##STR646##                                                                    ##STR647##                                                                    ##STR648##                                                                    ##STR649##                                                                    ##STR650##                                                                    ##STR651##                                                                    ##STR652##                                                                    ##STR653##                                                                    ##STR654##                                                                    ##STR655##                                                                    ##STR656##                                                                   __________________________________________________________________________

                  TABLE 114                                                       ______________________________________                                        Neutralization of Phage Titer Using                                           Agarose-immobilized Anhydro-Trypsin                                                           Percent Residual Titer                                                        As a Function of Time (hours)                                 Phage Type  Addition  1         2    4                                        ______________________________________                                        MK-BPTI      5 μl IS                                                                             99        104  105                                                   2 μl IAT                                                                            82        71   51                                                    5 μl IAT                                                                            57        40   27                                                   10 μl IAT                                                                            40        30   24                                       MK           5 μl IS                                                                             106       96   98                                                    2 μl IAT                                                                            97        103  95                                                    5 μl IAT                                                                            110       111  96                                                   10 μl IAT                                                                            99        93   106                                      ______________________________________                                         Legend:                                                                       IS = Immobilized streptavidin                                                 IAT = Immobilized anhydrotrypsin                                         

                  TABLE 115                                                       ______________________________________                                        Affinity Selection of MK-BPTI Phage                                           on Immobilized Anhydro-Trypsin                                                                       Percent of Total Phage                                 Phage Type  Addition   Recovered in Elution Buffer                            ______________________________________                                        MK-BPTI      5 μl IS                                                                              <<1.sup.a                                                           2 μl IAT                                                                               5                                                                 5 μl IAT                                                                               20                                                               10 μl IAT                                                                               50                                                   MK           5 μl IS                                                                              <<1.sup.a                                                           2 μl IAT                                                                             <<1                                                                 5 μl IAT                                                                             <<1                                                                10 μl IAT                                                                             <<1                                                    ______________________________________                                         Legend:                                                                       IS = Immobilized streptavidin                                                 IAT = Immobilized anhydrotrypsin                                              .sup.a not detectable.                                                   

                                      TABLE 116                                   __________________________________________________________________________    translation of Signal-III::bpti::mature-III                                   __________________________________________________________________________     ##STR657##                                                                    ##STR658##                                                                    ##STR659##                                                                    ##STR660##                                                                    ##STR661##                                                                    ##STR662##                                                                    ##STR663##                                                                    ##STR664##                                                                    ##STR665##                                                                    ##STR666##                                                                    ##STR667##                                                                    ##STR668##                                                                    ##STR669##                                                                    ##STR670##                                                                    ##STR671##                                                                    ##STR672##                                                                    ##STR673##                                                                    ##STR674##                                                                    ##STR675##                                                                    ##STR676##                                                                    ##STR677##                                                                    ##STR678##                                                                    ##STR679##                                                                    ##STR680##                                                                    ##STR681##                                                                    ##STR682##                                                                    ##STR683##                                                                    ##STR684##                                                                    ##STR685##                                                                    ##STR686##                                                                    ##STR687##                                                                    ##STR688##                                                                    ##STR689##                                                                   __________________________________________________________________________     ##STR690##                                                                     #STR691##                                                                     #STR692##                                                                     #STR693##                                                                     #STR694##                                                                    ##STR695##                                                               

    TABLE 130                                                                     ______________________________________                                        Sampling of a Library encoded by (NNK).sup.6                                  ______________________________________                                        A.  Numbers of hexapeptides in each class                                     total  = 64,000,000 stop-free sequences.                                      α can be one of [WMFYCIKDENHQ]                                          Φ can be one of [PTAVG]                                                   Ω can be one of [SLR]                                                   αααααα =                                              2985984.  Φααααα =                                                7464960.                                    Ωααααα =                                              4478976.  ΦΦαααα =                                                  7776000.                                    ΦΩαααα =                                                9331200.  ΩΩαααα =                                              2799360.                                    ΦΦΦααα =                                                    4320000.  ΦΦΩααα =                                                  7776000.                                    ΦΩΩααα =                                                4665600.  ΩΩΩααα =                                               933120.                                    ΦΦΦΦαα =                                                      1350000.  ΦΦΦΩαα =                                                    3240000.                                    ΦΦΩΩαα =                                                  2916000.  ΦΩΩΩαα =                                                1166400.                                    ΩΩΩΩαα =                                               174960.  ΦΦΦΦΦα =                                                         225000.                                    ΦΦΦΦΩα =                                                       675000.  ΦΦΦΩΩα =                                                     810000.                                    ΦΦΩΩΩα =                                                   486000.  ΦΩΩΩΩα =                                                 145800.                                    ΩΩΩΩΩα =                                               17496.   ΦΦΦΦΦΦ =                                                           15625.                                     ΦΦΦΦΦΩ =                                                         56250.   ΦΦΦΦΩΩ =                                                       84375.                                     ΦΦΦΩΩΩ =                                                     67500.   ΦΦΩΩΩΩ =                                                   30375.                                     ΦΩΩΩΩΩ =                                                  7290.   ΩΩΩΩΩΩ =                                                729.                                      ______________________________________                                         ΦΦΩΩαα, for example, stands for the set o     peptides having two amino acids from the α class, two from Φ,       and two from Ω arranged in any order. There are, for example, 729 =     3.sup.6 sequences composed entirely of S, L, and R.                      

                                      TABLE 130                                   __________________________________________________________________________    Sampling of a Library encoded by (NNK).sup.6                                  __________________________________________________________________________    B. Probability that any given stop-free DNA sequence                            will encode a hexapeptide from a stated class.                                         P    % of class                                                    αααααα...                                            3.364E-03                                                                          (1.13E-07)                                                    Φααααα...                                              1.682E-02                                                                          (2.25E-07)                                                    Ωααααα...                                            1.514E-02                                                                          (3.38E-07)                                                    ΦΦαααα...                                                3.505E-02                                                                          (4.51E-07)                                                    ΦΩαααα...                                              6.308E-02                                                                          (6.76E-07)                                                    ΩΩαααα...                                            2.839E-02                                                                          (1.01E-06)                                                    ΦΦΦααα...                                                  3.894E-02                                                                          (9.01E-07)                                                    ΦΦΩααα...                                                1.051E-01                                                                          (1.35E-06)                                                    ΦΩΩααα...                                              9.463E-02                                                                          (2.03E-06)                                                    ΩΩΩααα...                                            2.839E-02                                                                          (3.04E-06)                                                    ΦΦΦΦαα...                                                    2.434E-02                                                                          (1.80E-06)                                                    ΦΦΦΩαα...                                                  8.762E-02                                                                          (2.70E-06)                                                    ΦΦΩΩαα...                                                1.183E-01                                                                          (4.06E-06)                                                    ΦΩΩΩαα...                                              7.097E-02                                                                          (6.08E-06)                                                    ΩΩΩΩαα...                                            1.597E-02                                                                          (9.13E-06)                                                    ΦΦΦΦΦα...                                                      8.113E-03                                                                          (3.61E-06)                                                    ΦΦΦΦΩα...                                                    3.651E-02                                                                          (5.41E-06)                                                    ΦΦΦΩΩα...                                                  6.571E-02                                                                          (8.11E-06)                                                    ΦΦΩΩΩα ...                                               5.914E-02                                                                          (1.22E-05)                                                    ΦΩΩΩΩα...                                              2.661E-02                                                                          (1.83E-05)                                                    ΩΩΩΩΩα...                                            4.790E-03                                                                          (2.74E-05)                                                    ΦΦΦΦΦΦ...                                                        1.127E-03                                                                          (7.21E-06)                                                    ΦΦΦΦΦΩ...                                                      6.084E-03                                                                          (1.08E-05)                                                    ΦΦΦΦΩΩ...                                                    1.369E-02                                                                          (1.62E-05)                                                    ΦΦΦΩΩΩ...                                                  1.643E-02                                                                          (2.43E-05)                                                    ΦΦΩΩΩΩ...                                                1.109E-02                                                                          (3.65E-05)                                                    ΦΩΩΩΩΩ...                                              3.992E-03                                                                          (5.48E-05)                                                    ΩΩΩΩΩΩ...                                            5.988E-04                                                                          (8.21E-05)                                                    C. Number of different stop-free amino-acid sequences in                        each class expected for various library sizes                               Library size =  1.0000E+06                                                     total =  9.7446E+05  % sampled =  1.52                                       Class  Number  %                                                                              Class  Number  %                                              αααααα...                                          3362.6(.1)                                                                           Φααααα...                                           16803.4(.2)                                           Ωααααα...                                         15114.6(.3)                                                                           ΦΦαααα...                                             34967.8(.4)                                           ΦΩαααα...                                           62871.1(.7)                                                                           ΩΩαααα...                                         28244.3(1.0)                                          ΦΦΦααα...                                               38765.7(.9)                                                                           Φ ΦΩααα...                                            104432.2(1.3)                                         ΦΩΩααα...                                           93672.7(2.0)                                                                          ΩΩΩααα...                                         27960.3(3.0)                                          ΦΦΦΦαα...                                                 24119.9(1.8)                                                                          ΦΦΦΩαα...                                               86442.5(2.7)                                          ΦΦΩΩαα...                                             115915.5(4.0)                                                                         ΦΩΩΩαα...                                           68853.5(5.9)                                          ΩΩΩΩαα...                                         15261.1(8.7)                                                                          ΦΦΦΦΦα...                                                    7968.1(3.5)                                          ΦΦΦΦΩα...                                                 35537.2(5.3)                                                                          ΦΦΦΩΩα...                                               63117.5(7.8)                                          ΦΦΩΩΩα...                                             55684.4(11.5)                                                                         ΦΩΩΩΩα...                                           24325.9(16.7)                                         ΩΩΩΩΩα...                                          4190.6(24.0)                                                                         ΦΦΦΦΦΦ...                                                      1087.1(7.0)                                          ΦΦΦΦΦΩ...                                                    5767.0(10.3)                                                                         ΦΦΦΦΩΩ...                                                 12637.2(15.0)                                         ΦΦΦΩΩΩ...                                                14581.7(21.6)                                                                        ΦΦΩΩΩΩ...                                              9290.2(30.6)                                         ΦΩΩΩΩΩ...                                            3073.9(42.2)                                                                         ΩΩΩΩΩΩ...                                          408.4(56.0)                                          Library size =  3.0000E+06                                                     total =  2.7885E+06  % sampled =  4.36                                       αααααα...                                         10076.4(.3)                                                                           Φααααα...                                           50296.9(.7)                                           Ωααααα...                                         45190.9(1.0)                                                                          ΦΦαααα...                                             104432.2(1.3)                                         ΦΩαααα...                                           187345.5(2.0)                                                                         ΩΩαααα...                                         83880.9(3.0)                                          ΦΦΦααα...                                               115256.6(2.7)                                                                         ΦΦΩααα...                                             309107.9(4.0)                                         ΦΩΩααα...                                           275413.9(5.9)                                                                         ΩΩΩααα...                                         81392.5(8.7)                                          ΦΦΦΦαα...                                                 71074.5(5.3)                                                                          ΦΦΦΩαα...                                               252470.2(7.8)                                         ΦΦΩΩαα...                                             334106.2(11.5)                                                                        ΦΩΩΩαα...                                           194606.9(16.7)                                        ΩΩΩΩαα...                                         41905.9(24.0)                                                                         ΦΦΦΦΦα...                                                    23067.8(10.3)                                        ΦΦΦΦΩα...                                                 101097.3(15.0)                                                                        ΦΦΦΩΩα...                                               174981.0(21.6)                                        ΦΦΩΩΩα...                                             148643.7(30.6)                                                                        ΦΩΩΩΩα...                                           61478.9(42.2)                                         ΩΩΩΩΩα...                                          9801.0(56.0)                                                                         ΦΦΦΦΦΦ...                                                      3039.6(19.5)                                         ΦΦΦΦΦΩ...                                                   15587.7(27.7)                                                                         ΦΦΦΦΩΩ...                                                 32516.8(38.5)                                         ΦΦΦΩΩΩ...                                               34975.6(51.8)                                                                         ΦΦΩΩΩΩ...                                             20215.5(66.6)                                         ΦΩΩΩΩΩ...                                            5879.9(80.7)                                                                         ΩΩΩΩΩΩ...                                          667.0(91.5)                                          Library size =  1.0000E+07                                                     total =  8.1204E+06  % sampled =  12.69                                      αααααα...                                         33455.9(1.1)                                                                          Φααααα...                                           166342.4(2.2)                                         Ωααααα...                                         148871.1(3.3)                                                                         ΦΦαααα...                                             342685.7(4.4)                                         ΦΩαααα...                                           609987.6(6.5)                                                                         ΩΩαααα...                                         269958.3(9.6)                                         ΦΦΦααα...                                               372371.8(8.6)                                                                         ΦΦΩααα...                                             983416.4(12.6)                                        ΦΩΩααα...                                           856471.6(18.4)                                                                        ΩΩΩααα...                                         244761.5(26.2)                                        ΦΦΦΦαα...                                                 222702.0(16.5)                                                                        ΦΦΦΩαα...                                               767692.5(23.7)                                        ΦΦΩΩαα...                                             972324.6(33.3)                                                                        ΦΩΩΩαα...                                           531651.3(45.6)                                        ΩΩΩΩαα...                                         104722.3(59.9)                                                                        ΦΦΦΦΦα...                                                   68111.0(30.3)                                         ΦΦΦΦΩα...                                                 281976.3(41.8)                                                                        ΦΦΦΩΩα...                                               450120.2(55.6)                                        ΦΦΩΩΩα...                                             342072.1(70.4)                                                                        ΦΩΩΩΩα...                                           122302.6(83.9)                                        ΩΩΩΩΩα...                                         16364.0(93.5)                                                                         ΦΦΦΦΦΦ...                                                      8028.0(51.4)                                         ΦΦΦΦΦΩ...                                                   37179.9(66.1)                                                                         ΦΦΦΦΩΩ...                                                 67719.5(80.3)                                         ΦΦΦΩΩΩ...                                               61580.0(91.2)                                                                         ΦΦΩΩΩΩ...                                             29586.1(97.4)                                         ΦΩΩΩΩΩ...                                            7259.5(99.6)                                                                         ΩΩΩΩΩΩ...                                          728.8(100.0)                                         Library size =   3.0000E+07                                                    total =  1.8633E+07  % sampled =  29.11                                      αααααα...                                         99247.4(3.3)                                                                          Φααααα...                                           487990.0(6.5)                                         Ωααααα...                                         431933.3(9.6)                                                                         ΦΦαααα...                                             983416.5(12.6)                                        ΦΩαααα...                                          1712943.0(18.4)                                                                        ΩΩαααα...                                         734284.6(26.2)                                        ΦΦΦααα...                                              1023590.0(23.7)                                                                        ΦΦΩααα...                                            2592866.0(33.3)                                        ΦΩΩααα...                                          2126605.0(45.6)                                                                        ΩΩΩααα...                                         558519.0(59.9)                                        ΦΦΦΦαα...                                                 563952.6(41.8)                                                                        ΦΦΦΩαα...                                              1800481.0(55.6)                                        ΦΦΩΩαα...                                            2052433.0(70.4)                                                                        ΦΩΩΩαα...                                           978420.5(83.9)                                        ΩΩΩΩαα...                                         163640.3(93.5)                                                                        ΦΦΦΦΦα...                                                   148719.7(66.1)                                        ΦΦΦΦΩα...                                                 541755.7(80.3)                                                                        ΦΦΦΩΩα...                                               738960.1(91.2)                                        ΦΦΩΩΩα...                                             473377.0(97.4)                                                                        ΦΩΩΩΩα...                                           145189.7(99.6)                                        ΩΩΩΩΩα ...                                        17491.3(100.0)                                                                        ΦΦΦΦΦΦ...                                                     13829.1(88.5)                                         ΦΦΦΦΦΩ...                                                   54058.1(96.1)                                                                         ΦΦΦΦΩΩ...                                                 83726.0(99.2)                                         ΦΦΦΩΩΩ...                                               67454.5(99.9)                                                                         ΦΦΩΩΩΩ...                                             30374.5(100.0)                                        ΦΩΩΩΩΩ...                                            7290.0(100.0)                                                                        ΩΩΩΩΩΩ...                                          729.0(100.0)                                         Library size =  7.6000E+07                                                     total =  3.2125E+07  % sampled =  50.19                                      αααααα...                                         245057.8(8.2)                                                                         Φααααα...                                          1175010.0(15.7)                                        Ωααααα...                                        1014733.0(22.7)                                                                        ΦΦαααα...                                            2255280.0(29.0)                                        ΦΩαααα...                                          3749112.0(40.2)                                                                        ΩΩαααα...                                        1504128.0(53.7)                                        ΦΦΦααα...                                              2142478.0(49.6)                                                                        ΦΦΩααα...                                            4993247.0(64.2)                                        ΦΩΩααα...                                          3666785.0(78.6)                                                                        ΩΩΩααα...                                         840691.9(90.1)                                        ΦΦΦΦαα...                                                1007002.0(74.6)                                                                        ΦΦΦΩαα...                                              2825063.0(87.2)                                        ΦΦΩΩαα...                                            2782358.0(95.4)                                                                        ΦΩΩΩαα...                                          1154956.0(99.0)                                        ΩΩΩΩαα...                                         174790.0(99.9)                                                                        ΦΦΦΦΦα...                                                   210475.6(93.5)                                        ΦΦΦΦΩα...                                                 663929.3(98.4)                                                                        ΦΦΦΩΩα...                                               808298.6(99.8)                                        ΦΦΩΩΩα...                                             485953.2(100.0)                                                                       ΦΩΩΩΩα...                                           145799.9(100.0)                                       ΩΩΩΩΩα...                                         17496.0(100.0)                                                                        ΦΦΦΦΦΦ...                                                     15559.9(99.6)                                         ΦΦΦΦΦΩ...                                                   56234.9(100.0)                                                                        ΦΦΦΦΩΩ...                                                 84374.6(100.0)                                        ΦΦΦΩΩΩ...                                               67500.0(100.0)                                                                        ΦΦΩΩΩΩ...                                             30375.0(100.0)                                        ΦΩΩΩΩΩ...                                            7290.0(100.0)                                                                        ΩΩΩΩΩΩ...                                          729.0(100.0)                                         Library size =  1.0000E+08                                                     total =  3.6537E+07  % sampled =  57.09                                      αααααα...                                         318185.1(10.7)                                                                        Φααααα...                                          1506161.0(20.2)                                        Ωααααα...                                        1284677.0(28.7)                                                                        ΦΦαααα...                                            2821285.0(36.3)                                        ΦΩααα α...                                         4585163.0(49.1)                                                                        ΩΩαααα...                                        1783932.0(63.7)                                        ΦΦΦααα...                                              2566085.0(59.4)                                                                        ΦΦΩααα...                                            5764391.0(74.1)                                        ΦΩΩααα...                                          4051713.0(86.8)                                                                        ΩΩΩααα...                                         888584.3(95.2)                                        ΦΦΦΦαα...                                                1127473.0(83.5)                                                                        ΦΦΦΩαα...                                              3023170.0(93.3)                                        ΦΦΩΩαα...                                            2865517.0(98.3)                                                                        ΦΩΩΩαα...                                          1163743.0(99.8)                                        ΩΩΩΩαα...                                         174941.0(100.0)                                                                       ΦΦΦΦΦα...                                                   218886.6(97.3)                                        ΦΦΦΦΩα...                                                 671976.9(99.6)                                                                        ΦΦΦΩΩα...                                               809757.3(100.0)                                       ΦΦΩΩΩα...                                             485997.5(100.0)                                                                       ΦΩΩΩΩα...                                           145800.0(100.0)                                       ΩΩΩΩΩα...                                         17496.0(100.0)                                                                        ΦΦΦΦΦΦ...                                                     15613.5(99.9)                                         ΦΦΦΦΦΩ...                                                   56248.9(100.0)                                                                        ΦΦΦΦΩΩ...                                                 84375.0(100.0)                                        ΦΦΦΩΩΩ...                                               67500.0(100.0)                                                                        ΦΦΩΩΩΩ...                                             30375.0(100.0)                                        ΦΩΩΩΩΩ...                                            7290.0(100.0)                                                                        ΩΩΩΩΩΩ                                             729.0(100.0)                                         Library size =  3.0000E+08                                                     total =  5.2634E+07  % sampled =  82.24                                      αααααα...                                         856451.3(28.7)                                                                        Φααααα...                                          3668130.0(49.1)                                        Ωααααα...                                        2854291.0(63.7)                                                                        ΦΦαααα...                                            5764391.0(74.1)                                        ΦΩαααα...                                          8103426.0(86.8)                                                                        ΩΩαααα...                                        2665753.0(95.2)                                        ΦΦΦααα...                                              4030893.0(93.3)                                                                        ΦΦΩααα...                                            7641378.0(98.3)                                        ΦΩΩααα...                                          4654972.0(99.8)                                                                        ΩΩΩααα...                                         933018.6(100.0)                                       ΦΦΦΦαα...                                                1343954.0(99.6)                                                                        ΦΦΦΩαα...                                              3239029.0(100.0)                                       ΦΦΩΩαα...                                            2915985.0(100.0)                                                                       ΦΩΩΩαα...                                          1166400.0(100.0)                                       ΩΩΩΩαα...                                         174960.0(100.0)                                                                       ΦΦΦΦΦα...                                                   224995.5(100.0)                                       ΦΦΦΦΩα...                                                 674999.9(100.0)                                                                       ΦΦΦΩΩα...                                               810000.0(100.0)                                       ΦΦΩΩΩα...                                             486000.0(100.0)                                                                       ΦΩΩΩΩα...                                           145800.0(100.0)                                       ΩΩ ΩΩΩα...                                        17496.0(100.0)                                                                        ΦΦΦΦΦΦ...                                                     15625.0(100.0)                                        ΦΦΦΦΦΩ...                                                   56250.0(100.0)                                                                        ΦΦΦΦΩΩ...                                                 84375.0(100.0)                                        ΦΦΦΩΩΩ...                                               67500.0(100.0)                                                                        ΦΦΩΩΩΩ...                                             30375.0(100.0)                                        ΦΩΩΩΩΩ...                                            7290.0(100.0)                                                                        ΩΩΩΩΩΩ...                                          729.0(100.0)                                         Library size =  1.0000E+09                                                     total =  6.1999E+07  % sampled =  96.87                                      αααααα...                                        2018278.0(67.6)                                                                        Φααααα...                                          6680917.0(89.5)                                        Ωααααα...                                        4326519.0(96.6)                                                                        ΦΦαααα...                                            7690221.0(98.9)                                        ΦΩαααα...                                          9320389.0(99.9)                                                                        ΩΩαααα...                                        2799250.0(100.0)                                       ΦΦΦααα...                                              4319475.0(100.0)                                                                       ΦΦΩααα...                                            7775990.0(100.0)                                       ΦΩΩααα...                                          4665600.0(100.0)                                                                       ΩΩΩααα...                                         933120.0(100.0)                                       ΦΦΦΦαα...                                                1350000.0(100.0)                                                                       ΦΦΦΩαα...                                              3240000.0(100.0)                                       ΦΦΩΩα α...                                           2916000.0(100.0)                                                                       ΦΩΩΩαα...                                          1166400.0(100.0)                                       ΩΩΩΩαα...                                         174960.0(100.0)                                                                       ΦΦΦΦΦα...                                                   225000.0(100.0)                                       ΦΦΦΦΩα...                                                 675000.0(100.0)                                                                       ΦΦΦΩΩα...                                               810000.0(100.0)                                       ΦΦΩΩΩα...                                             486000.0(100.0)                                                                       ΦΩΩΩΩα...                                           145800.0(100.0)                                       ΩΩΩΩΩα...                                         17496.0(100.0)                                                                        ΦΦΦΦΦΦ...                                                     15625.0(100.0)                                        ΦΦΦΦΦΩ...                                                   56250.0(100.0)                                                                        ΦΦΦΦΩΩ...                                                 84375.0(100.0)                                        ΦΦΦΩΩΩ...                                               67500.0(100.0)                                                                        ΦΦΩΩΩΩ...                                             30375.0(100.0)                                        ΦΩΩΩΩΩ...                                            7290.0(100.0)                                                                        ΩΩΩΩΩΩ...                                          729.0(100.0)                                         Library size =  3.0000E+09                                                     total =  6.3890E+07  % sampled =  99.83                                      αααααα...                                        2884346.0(96.6)                                                                        Φααααα...                                          7456311.0(99.9)                                        Ωααααα...                                        4478800.0(100.0)                                                                       ΦΦαααα...                                            7775990.0(100.0)                                       ΦΩα ααα...                                         9331200.0(100.0)                                                                       ΩΩαααα...                                        2799360.0(100.0)                                       ΦΦΦααα...                                              4320000.0(100.0)                                                                       ΦΦΩααα...                                            7776000.0(100.0)                                       ΦΩΩααα...                                          4665600.0(100.0)                                                                       ΩΩΩααα...                                         933120.0(100.0)                                       ΦΦΦΦαα...                                                1350000.0(100.0)                                                                       ΦΦΦΩαα...                                              3240000.0(100.0)                                       ΦΦΩΩαα...                                            2916000.0(100.0)                                                                       ΦΩΩΩαα...                                          1166400.0(100.0)                                       ΩΩΩΩαα...                                         174960.0(100.0)                                                                       ΦΦΦΦΦα...                                                   225000.0(100.0)                                       ΦΦΦΦΩα...                                                 675000.0(100.0)                                                                       ΦΦΦΩΩα...                                               810000.0(100.0)                                       ΦΦΩΩΩα...                                             486000.0(100.0)                                                                       ΦΩΩΩΩα...                                           145800.0(100.0)                                       ΩΩΩΩΩα...                                         17496.0(100.0)                                                                        ΦΦΦΦΦΦ...                                                     15625.0(100.0)                                        ΦΦΦΦΦΩ...                                                   56250.0(100.0)                                                                        ΦΦΦΦΩΩ...                                                 84375.0(100.0)                                        ΦΦΦΩΩΩ...                                              67500.0(100.0)                                                                         ΦΦΩΩΩΩ...                                             30375.0(100.0)                                        ΦΩΩΩΩΩ...                                            7290.0(100.0)                                                                        ΩΩΩΩΩΩ                                             729.0(100.0)                                         D. Formulae for tabulated quantities.                                         Lsize is the number of independent transformants.                             31**6 is 31 to sixth power; 6*3 means 6 times 3.                              A = Lsize/(31**6)                                                             α can be one of [WMFYCIKDENHQ.]                                         Φ can be one of [PTAVG]                                                   Ω can be one of [SLR]                                                   F0 = (12)**6                                                                           F1 = (12)**5                                                                           F2 = (12)**4                                                F3 = (12)**3                                                                           F4 = (12)**2                                                                           F5 = (12)                                                   F6 = 1                                                                        αααααα = F0 * (1-exp(-A))                 Φααααα = 6 * 5 * F1 * (1-exp(-2*A))         Ωααααα = 6 * 3 * F1 * (1-exp(-3*A))       ΦΦαααα = (15) * 5**2 * F2 *                   (1-exp(-4*A))                                                                 ΦΩαααα = (6*5)*5*3 *F2 * (1-exp(-6*A))      ΩΩαααα = (15) * 3**2 * F2 *               (1-exp(-9*A))                                                                 ΦΦΦααα = (20)*(5**3) * F3 * (1-exp(-8*A))       ΦΦΩααα  = (60)*(5*5*3)*F3*                    (1-exp(-12*A))                                                                ΦΩΩααα = (60)*(5*3*3)*F3*(1-exp(-18*A))     ΩΩΩααα = (20)*(3)**3*F3*(1-exp(-27*A))    ΦΦΦΦαα = (15)*(5)**4*F4*(1-exp(-16*A))            ΦΦΦΩαα = (60)*(5)**3*3*F4*(1-exp(-24*A))        ΦΦΩΩαα = (90)*(5*5*3*3)*F4*(1-exp(-36*A))     ΦΩΩΩαα = (60)*(5*3*3*3)*F4*(1-exp(-54*A)    ΩΩΩΩαα = (15)*(3)**4 * F4                 *(1-exp(-81*A))                                                               ΦΦΦΦΦα =(6)*(5)**5 * F5 * (1-exp(-32*A))            ΦΦΦΦΩα = 30*5*5*5*5*3*F5*(1-exp(-48*A))           ΦΦΦΩΩα = 60*5*5*5*3*3*F5*(1-exp(-72*A))         ΦΦΩΩΩα = 60*5*5*3*3*3*F5*(1-exp(-108*A))      ΦΩΩΩΩα = 30*5*3*3*3*3*F5*(1-exp(-162*A))    ΩΩΩΩΩα = 6*3*3*3*3*3*F5*(1-exp(-243*A)    )                                                                             ΦΦΦΦΦΦ = 5**6 * (1-exp(-64*A))                        ΦΦΦΦΦΩ = 6*3*5**5*(1-exp(-96*A))                    ΦΦΦΦΩΩ = 15*3*3*5**4*(1-exp(-144*A))              ΦΦΦΩΩΩ = 20*3**3*5**3*(1-exp(-216*A))           ΦΦΩΩΩΩ = 15*3**4*5**2*(1-exp(-324*A))         ΦΩΩΩΩ Ω = 6*3**5*5*(1-exp(-486*A))          ΩΩΩΩΩΩ = 3**6*(1-exp(-729*A))             total =                                                                              αααααα +                                         Φααααα +                                           Ωααααα +                                         ΦΦαααα                                               ΦΩααα.alpha                                       . +                                               ΩΩαααα +                                         ΦΦΦααα +                                               ΦΦΩααα +                                             ΦΩΩααα                                             ΩΩΩαα.alp                                       ha. +                                             ΦΦΦΦαα +                                                 ΦΦΦΩαα +                                               ΦΦΩΩαα +                                             ΦΩΩΩαα                                             ΩΩΩΩα.alp                                       ha. +                                             ΦΦΦΦΦα +                                                   ΦΦΦΦΩα +                                                 ΦΦ ΦΩΩα +                                              ΦΦΩΩΩα                                               ΦΩΩΩΩ.alpha                                       . +                                               ΩΩΩΩΩα +                                         ΦΦΦΦΦΦ +                                                     ΦΦΦΦΦΩ +                                                   ΦΦΦΦΩΩ +                                                 ΦΦΦΩΩΩ                                          +                                                 ΦΦΩΩΩΩ +                                             ΦΩΩΩΩΩ +                                           ΩΩΩΩΩΩ               __________________________________________________________________________     (The amino acids referred to in Table 130 need not be in sequence, but if     they are, the sequences all have SEQ ID NO: 88.)                         

                  TABLE 131                                                       ______________________________________                                        Sampling of a Library                                                         Encoded by (NNT).sup.4 (NNG).sup.2                                            ______________________________________                                        X can be F,S,Y,C,L,P,H,R,I,T,N,V,A,D,G                                        Γ can be L.sup.2,R.sup.2,S,W,P,Q,M,T,K,V,A,E,G                          Library comprises 8.55   10.sup.6 amino-acid sequences; 1.47 ·       10.sup.7                                                                      DNA sequences.                                                                Total number of possible aa sequences= 8,555,625                              x        LVPTARGFYCHIND                                                       S        S                                                                    Θ  VPTAGWQMKES                                                          Ω  LR                                                                   The first, second, fifth, and sixth positions can                             hold x or S; the third and fourth position can hold Θ or                Ω. I have lumped sequences by the number of xs, Ss, Θs,           and Ωs.                                                                 For example xxΘΩSS stands for:                                    [xxΘΩSS, xSΘΩxS, xSΘΩSx,                  SSΘΩxx, SxΘΩxS, SxΘΩSx,                   .sup. xxΩΘSS, xSΩΘxS, xSΩΘSx,             SSΩΘxx, SxΩΘxS, SxΩΘSx]                   The following table shows the likelihood that any                             particular DNA sequence will fall into one of the defined                     classes.                                                                       Library size = Sampling = .00001%                                             total........                                                                       1.0000E+00   % sampled.....                                                                           1.1688E-07                                      xxΘΘxx.......                                                           3.1524E-01   xxΘΩxx.......                                                                2.2926E-01                                      xxΩΩxx.......                                                           4.1684E-02   xxΘΘxS.......                                                                1.8013E-01                                      xxΘΩxS.......                                                           1.3101E-01   xxΩΩxS.......                                                                2.3819E-02                                      xxΘΘSS.......                                                           3.8600E-02   xxΘΩSS.......                                                                2.8073E-02                                      xxΩΩSS.......                                                           5.1042E-03   xSΘΘSS.......                                                                3.6762E-03                                      xSΘΩSS.......                                                           2.736E-03    xSΩΩSS.......                                                                4.8611E-04                                      SSΘΘSS.......                                                           1.3129E-04   SSΘΩSS.......                                                                9.5486E-05                                      SSΩΩSS.......                                                           1.7361E-05                                                             The following sections show how many sequences of                             each class are expected for libraries of different sizes.                      Library size =   1.0000E+05                                                   total......  9.91374E+04   fraction sampled = 1.1587E-02                      Type    Number   %   Type       Number   %                                   ______________________________________                                         xxΘΘxx.....                                                                31416.9(.7) xxΘΩxx.....                                                                   22771.4(1.3)                                 xxΩΩxx.....                                                                 4112.4(2.7)                                                                              xxΘΘxS.....                                                                   17891.8(1.3)                                 xxΘΩxS.....                                                                12924.6(2.7)                                                                              xxΩΩxS.....                                                                    2318.5(5.3)                                 xxΘΘSS.....                                                                 3808.1(2.7)                                                                              xxΘΩSS.....                                                                    2732.5(5.3)                                 xxΩΩSS.....                                                                 483.7(10.3)                                                                              xSΘΘSS.....                                                                     357.8(5.3)                                 xSΘΩSS.....                                                                 253.4(10.3)                                                                              xSΩΩSS.....                                                                     43.7(19.5)                                 SSΘΘSS.....                                                                  12.4(10.3)                                                                              SSΘΩSS.....                                                                     8.6(19.5)                                  SSΩΩSS.....                                                                  1.4(35.2)                                                          Library size =   1.0000E+06                                                   total......  9.2064E+05   fraction sampled = 1.0761E-01                       xxΘΘxx.....                                                                304783.9(6.6)                                                                             xxΘΩxx.....                                                                   214394.0(12.7)                               xxΩΩxx.....                                                                36508.6(23.8)                                                                             xxΘΘxS.....                                                                   168452.5(12.7)                               xxΘΩxS.....                                                                114741.4(23.8)                                                                            xxΩΩxS.....                                                                   18383.8(41.9)                                xxΘΘSS.....                                                                33807.7(23.8)                                                                             xxΘΩSS.....                                                                   21666.6(41.9)                                xxΩΩSS.....                                                                 3114.6(66.2)                                                                             xSΘΘSS.....                                                                    2837.3(41.9)                                xSΘΩSS.....                                                                 1631.5(66.2)                                                                             xSΩΩSS.....                                                                    198.4(88.6)                                 SSΘΘSS.....                                                                  80.1(66.2)                                                                              SSΘΩSS.....                                                                     39.0(88.6)                                 SSΩΩSS.....                                                                  3.9(98.7)                                                          Library size =   3.0000E+06                                                   total......  2.3880E+06   fraction sampled = 2.7912E-01                       xxΘΘxx.....                                                                855709.5(18.4)                                                                            xxΘΩxx.....                                                                   565051.6(33.4)                               xxΩΩxx.....                                                                85564.7(55.7)                                                                             xxΘΘxS.....                                                                   443969.1(33.4)                               xxΘΩxS.....                                                                268917.8(55.7)                                                                            xxΩΩxS.....                                                                   35281.3(80.4)                                xxΘΘSS.....                                                                79234.7(55.7)                                                                             xxΘΩSS.....                                                                   41581.5(80.4)                                xxΩΩSS.....                                                                 4522.6(96.1)                                                                             xSΘΘSS.....                                                                    5445.2(80.4)                                xSΘΩSS.....                                                                 2369.0(96.1)                                                                             SSΩΩSS.....                                                                    223.7(99.9)                                 SSΘΘSS.....                                                                 116.3(96.1)                                                                              SSΘΩSS.....                                                                     43.9(99.9)                                 SSΩΩSS.....                                                                  4.0(100.0)                                                         Library size =   8.5556E+06                                                   total......  4.9303E+06   fraction sampled = 5.7626E-01                       xxΘΘxx.....                                                               2046301.0(44.0)                                                                            xxΘΩxx.....                                                                  1160645.0(68.7)                               xxΩΩxx.....                                                                138575.9(90.2)                                                                            xxƒΘxS.....                                                                911935.6(68.7)                               xxΘΩxS.....                                                                435524.3(90.2)                                                                            xxΩΩxS.....                                                                   43480.7(99.0)                                xxΘΘSS.....                                                                128324.1(90.2)                                                                            xxΘΩSS.....                                                                   51245.1(99.0)                                xxΩΩSS.....                                                                 4703.6(100.0)                                                                            xSΘΘSS.....                                                                    6710.7(99.0)                                xSΘΩSS.....                                                                 2463.8(100.0)                                                                            xSΩΩSS.....                                                                    224.0(100.0)                                SSΘΘSS.....                                                                 121.0(100.0)                                                                             SSΘΩSS.....                                                                     44.0(100.0)                                SSΩΩSS.....                                                                  4.0(100.0)                                                         Library size =    1.0000E+07                                                  total......  5.3667E+06   fraction sampled = 6.2727E-01                       xxΘΘxx.....                                                               2289093.0(49.2)                                                                            xxΘΩxx.....                                                                  1254877.0(74.2)                               xxΩΩxx.....                                                                143467.0(93.4)                                                                            xxΘΘxS.....                                                                   985974.9(74.2)                               xxΘΩxS.....                                                                450896.3(93.4)                                                                            xxΩΩxS.....                                                                   43710.7(99.6)                                xxΘΘSS.....                                                                132853.4(93.4)                                                                            xxΘΩSS.....                                                                   51516.1(99.6)                                xxΩΩSS.....                                                                 4703.9(100.0)                                                                            xSΘΘSS.....                                                                    6746.2(99.6)                                xSΘΩSS.....                                                                 2464.0(100.0)                                                                            xSΩΩSS.....                                                                    224.0(100.0)                                SSΘΘSS.....                                                                 121.0(100.0)                                                                             SSΘΩSS.....                                                                     44.0(100.0)                                SSΩΩSS.....                                                                  4.0(100.0)                                                         Library size =   3.0000E+07                                                   total......  7.8961E+06   fraction sampled = 9.2291E-01                       xxΘΘxx.....                                                               4040589.0(86.9)                                                                            xxΘΩxx.....                                                                  1661409.0(98.3)                               xxΩΩxx.....                                                                153619.1(100.0)                                                                           xxΘΘxS.....                                                                  1305393.0(98.3)                               xxΘΩxS.....                                                                482802.9(100.0)                                                                           xxΩΩxS.....                                                                   43904.0(100.0)                               xxΘΘSS.....                                                                142254.4(100.0)                                                                           xxΘΩSS.....                                                                   51744.0(100.0)                               xxΩΩSS.....                                                                 4704.0(100.0)                                                                            xSΘΘSS.....                                                                    6776.0(100.0)                               xSΘΩSS.....                                                                 2464.0(100.0)                                                                            xSΩ ΩSS.....                                                                   224.0(100.0)                                SSΘΘSS.....                                                                 121.0(100.0)                                                                             SSΘΩSS.....                                                                     44.0(100.0)                                SSΩΩSS.....                                                                  4.0(100.0)                                                         Library size =   5.0000E+07                                                   total......  8.3956E+06   fraction sampled = 9.8130E-01                       xxΘΘxx.....                                                               4491779.0(96.6)                                                                            xxΘΩxx.....                                                                  1688387.0(99.9)                               xxΩΩxx.....                                                                153663.8(100.0)                                                                           xxΘΘxS.....                                                                  1326590.0(99.9)                               xxΘΩxS.....                                                                482943.4(100.0)                                                                           xxΩΩxS.....                                                                   43904.0(100.0)                               xxΘΘSS.....                                                                142295.8(100.0)                                                                           xxΘΩS.....                                                                    51744.0(100.0)                               xxΩΩSS.....                                                                 4704.0(100.0)                                                                            xSΘΘSS.....                                                                    6776.0(100.0)                               xxΘΩSS.....                                                                 2464.0(100.0)                                                                            xSΩΩSS.....                                                                    224.0(100.0)                                SSΘΘSS.....                                                                 121.0(100.0)                                                                             SSΘΩSS.....                                                                     44.0(100.0)                                SSΩΩSS.....                                                                  4.0(100.0)                                                         Library size =   1.0000E+08                                                   total......  8.5503E+06   fraction sampled = 9.9938E-01                       xxΘΘxx.....                                                               4643063.0(99.9)                                                                            xxΘΩxx.....                                                                  1690302.0(100.0)                              xxΩΩxx.....                                                                153664.0(100.0)                                                                           xxΘΘxS.....                                                                  1328094.0(100.0)                              xxΘΩxS.....                                                                482944.0(100.0)                                                                           xxΩΩxS.....                                                                   43904.0(100.0)                               xxΘΘSS.....                                                                142296.0(100.0)                                                                           xxΘΩSS.....                                                                   51744.0(100.0)                               xxΩΩSS.....                                                                 4704.0(100.0)                                                                            SΘΘSS.....                                                                     6776.0(100.0)                               xSΘΩSS.....                                                                 2464.0(100.0)                                                                            xSΩΩSS.....                                                                    224.0(100.0)                                SSΘΘSS.....                                                                 121.0(100.0)                                                                             SSΘΩSS.....                                                                     44.0(100.0)                                SSΩΩSS.....                                                                  4.0(100.0)                                                        ______________________________________                                         (The amino acids referred to in Table 131 need not be in sequence, but if     they are, the sequences all have SEQ ID NO: 88.)                         

                  TABLE 132                                                       ______________________________________                                        Relative efficiencies of                                                      various simple variegation codons                                                     Number of codons                                                                5           6           7                                                     #DNA/#AA    #DNA/#AA    #DNA/#AA                                              [#DNA]      [#DNA]      [#DNA]                                      vgCodon   (#AA)       (#AA)       (#AA)                                       ______________________________________                                        NNK       8.95        13.86       21.49                                       assuming  [2.86 · 10.sup.7 ]                                                               [8.87 · 10.sup.8 ]                                                               [2.75 · 10.sup.10 ]                stops vanish                                                                            (3.2 · 10.sup.6)                                                                 (6.4 · 10.sup.7)                                                                 (1.28 · 10.sup.9)                  NNT       1.38         1.47        1.57                                                 [1.05 · 10.sup.6 ]                                                               [1.68 · 10.sup.7 ]                                                               [2.68 · 10.sup.8 ]                           (7.59 · 10.sup.5 )                                                               (1.14 · 10.sup.7)                                                                (1.71 · 10.sup.8)                  NNG       2.04         2.36        2.72                                       assuming  [7.59 · 10.sup.5 ]                                                               [1.14 · 10.sup.6 ]                                                               [1.71 · 10.sup.8 ]                 stops vanish                                                                            (3.7 · 10.sup.5)                                                                 (4.83 · 10.sup.6)                                                                (6.27 · 10.sup.7)                  ______________________________________                                    

                  TABLE 140                                                       ______________________________________                                        Affect of anti BPTI IgG on phage titer.                                       Phage                       +Anti-BPTI                                                                              Eluted                                  Strain   Input   +Anti-BPTI +Protein A (a)                                                                          Phage                                   ______________________________________                                        M13MP18  100 (b) 98         92        7 · 10.sup.-4                  BPTI.3   100     26         21        6                                       m13MB48 (c)                                                                            100     90         36        0.8                                     M13MB48 (d)                                                                            100     60         40        2.6                                     ______________________________________                                         (a) Protein Aagarose beads.                                                   (b) Percentage of input phage measured as plaque forming units                (c) Batch number 3                                                            (d) Batch number 4                                                       

                                      TABLE 142                                   __________________________________________________________________________    Affect of anti-BPTI or protein A on phage titer.                                         No          +Protein A                                                                           +Anti-BPTI                                      Strain Input                                                                             Addition                                                                           +Anti-BPTI                                                                           (a)    +Protein A                                      __________________________________________________________________________    M13MP18                                                                              100 (b)                                                                           107  105    72     65                                              M13MB48 (b)                                                                          100  92  7 · 10.sup.-3                                                               58     >10.sup.-4                                      __________________________________________________________________________     (a) Protein Aagarose beads.                                                   (b) Percentage of input phage measured as plaque forming units                (c) Batch number 3                                                            (d) Batch number 4                                                       

                  TABLE 142                                                       ______________________________________                                        Affect of anti-BPTI and non-immune serum on phage titer                                                      +Anti-BPTI                                                                             +NRS                                                   +Anti-  +NRS  +Protein A                                                                             +Protein                              Strain   Input   BPTI    (a)   (b)      A                                     ______________________________________                                        M13MP18  100 (c) 65      104   71        88                                   M13MB48 (d)                                                                            100     30      125   13       121                                   M13MB48 (e)                                                                            100      2      105   0.7      110                                   ______________________________________                                         (a) Purified IgG from normal rabbit serum.                                    (b) Protein Aagarose beads.                                                   (c) Percentage of input phage measured as plaque forming units                (d) Batch number 4                                                            (e) Batch number 5                                                       

                  TABLE 143                                                       ______________________________________                                        Loss in titer of display phage with anhydrotrypsin.                                    Anhydrotrypsin                                                                             Streptavidin                                                     Beads        Beads                                                                      Post            Post                                       Strain     Start   Incubation Start                                                                              Incubation                                 ______________________________________                                        M13MP18    100 (a) 121        ND   ND                                         M13MB48    100     58         100  98                                         5AA Pool   100     44         100  93                                         ______________________________________                                         (a) Plaque forming units expressed as a percentage of input.             

                  TABLE 144                                                       ______________________________________                                        Plaque forming units expressed as a percentage of input.                                                Relative to                                         Strain       Eluted Phage (a)                                                                           M13MP18                                             ______________________________________                                        Experiment 1                                                                  M13MP18      0.2 (a)      1.0                                                 BPTI-IIIMK   7.9          39.5                                                M13MB48      11.2         56.0                                                Experiment 2                                                                  M13mp18      0.3          1.0                                                 BPTI-IIIMK   12.0         40.0                                                M13MB56      17.0         56.7                                                ______________________________________                                         (a) Plaque forming units acid eluted from beads, expressed as a percentag     of the input.                                                            

                  TABLE 145                                                       ______________________________________                                        Binding of Display Phage to Anhydrotrypsin or Trypsin.                        Anhydrotrypsin Beads                                                                              Trypsin Beads                                                     Eluted   Relative   Eluted  Relative                                  Strain  Phage (a)                                                                              Binding (b)                                                                              Phage   Binding                                   ______________________________________                                        M13MP18 0.1       1         2.3 × 10.sup.-4                                                                 1.0                                       BPTI-IIIMK                                                                            9.1      91         1.17    5 × 10.sup.3                        M13.3X7 25.0     250        1.4     6 × 10.sup.3                        M13.3X11                                                                              9.2      92         0.27    1.2 × 10.sup.3                      ______________________________________                                         (a) Plaque forming units eluted from beads, expressed as a percentage of      the input.                                                                    (b) Relative to the nondisplay phage, M13MP18.                           

                  TABLE 146                                                       ______________________________________                                        Binding of Display Phage to Trypsin or Human                                  Neutrophil Elastase.                                                          Trypsin Beads       HNE Beads                                                         Eluted   Relative   Eluted  Relative                                  Strain  Phage (a)                                                                              Binding (b)                                                                              Phage   Binding                                   ______________________________________                                        M13MP18 5 × 10.sup.-4                                                                      1        3 × 10.sup.-4                                                                    1.0                                      BPTI-IIIMK                                                                            1.0      2000       5 × 10.sup.-3                                                                   16.7                                      M13MB48 0.13      260       9 × 10.sup.-3                                                                   30.0                                      M13.3X7 1.15     2300       1 × 10.sup.-3                                                                    3.3                                      M13.3X11                                                                              0.8      1600       2 × 10.sup.-3                                                                    6.7                                      BPTI3.CL                                                                              1 × 10.sup.-3                                                                      2        4.1     1.4 × 10.sup.4                      (c)                                                                           ______________________________________                                         (a) Plaque forming units acid eluted from the beads, expressed as a           percentage of input.                                                          (b) Relative to the nondisplay phage, M13MP18.                                (c) BPTIIIIMK (K15L MGNG)                                                

                  TABLE 155                                                       ______________________________________                                        Distance in Å between alpha carbons in octapeptides:                      1       2        3      4      5    6     7   8                               ______________________________________                                        Extended Strand: angle of C.sub.α 1-C.sub.α 2-C.sub.α 3     = 138°                                                                 1    --                                                                       2    3.8    --                                                                3    7.1    3.8      --                                                       4    10.7   7.1      3.8  --                                                  5    14.2   10.7     7.1  3.8    --                                           6    17.7   14.1     10.7 7.1    3.8  --                                      7    21.2   17.7     14.1 10.6   7.0  3.8   --                                8    24.6   20.9     17.5 13.9   10.6 7.0   3.8 --                            Reverse turn between residues 4 and 5.                                        2    3.8    --                                                                3    7.1    3.8      --                                                       4    10.6   7.0      3.8  --                                                  5    11.6   8.0      6.1  3.8    --                                           6    9.0    5.8      5.5  5.6    3.8  --                                      7    6.2    4.1      6.3  8.0    7.0  3.8   --                                8    5.8    6.0      9.1  11.6   10.7 7.2   3.8 --                            Alpha helix: angle of C.sub.α 1-C.sub.α 2-C.sub.α 3 =       93°                                                                    1                                                                             2    3.8    --                                                                3    5.5    3.8      --                                                       4    5.1    5.4      3.8  --                                                  5    6.6    5.3      5.5  3.8    --                                           6    9.3    7.0      5.6  5.5    3.8  --                                      7    10.4   9.3      6.9  5.4    5.5  3.8   --                                8    11.3   10.7     9.5  6.8    5.6  5.6   3.8 --                            ______________________________________                                    

                  TABLE 156                                                       ______________________________________                                        Distances between alpha carbons in closed mini-proteins of                    the form disulfide cyclo (CXXXXC)                                             ______________________________________                                        1         92         3     4       5   6                                      ______________________________________                                        Minimum distance                                                              1       --                                                                    2       3.8   --                                                              3       5.9   3.8        --                                                   4       5.6   6.0        3.8 --                                               5       4.7   5.9        6.0 3.8     --                                               4.8   5.3        5.1 5.2     3.8 --                                   Average distance                                                              1       --                                                                    2       3.8   --                                                              3       6.3   3.8        --                                                   4       7.5   6.4        3.8 --                                               5       7.1   7.5        6.3 3.8     --                                       6       5.6   7.5        7.7 6.4     3.8 --                                   Maximum distance                                                              1       --                                                                    2       3.8   --                                                              3       6.7   3.8        --                                                   4       9.0   6.9        3.8 --                                               5       8.7   8.8        6.8 3.8     --                                       6       6.6   9.2        9.1 6.8     3.8 --                                   ______________________________________                                    

                  TABLE 160                                                       ______________________________________                                        pH Profile of BPTI-III MK phage and Epine 1                                   phage binding to Cat G beads.                                                 pH      Total pfu in Fraction                                                                        Percentage of Input                                    ______________________________________                                        BPTI-IIIMK (BPTI has SEQ ID NO: 44)                                           7       3.7 × 10.sup.5                                                                         3.7 × 10.sup.-2                                  6       3.1 × 10.sup.5                                                                         3.1 × 10.sup.-2                                  5       1.4 × 10.sup.5                                                                         1.4 × 10.sup.-2                                  4.5     3.1 × 10.sup.4                                                                         3.1 × 10.sup.-3                                  4       7.1 × 10.sup.3                                                                         7.1 × 10.sup.-4                                  3.5     2.6 × 10.sup.3                                                                         2.6 × 10.sup.-4                                  3       2.5 × 10.sup.3                                                                         2.5 × 10.sup.-4                                  2.5     8.8 × 10.sup.2                                                                         8.8 × 10.sup.-5                                  2       7.6 × 10.sup.2                                                                         7.6 × 10.sup.-5                                  (total input = 1 × 10.sup.9 phage)                                      Shad 1 EpiNE1 (EpiNE1 has SEQ ID NO: 51)                                      7       2.5 × 10.sup.5                                                                         1.1 × 10.sup.-2                                  6       6.3 × 10.sup.4                                                                         2.7 × 10.sup. -3                                 5       7.4 × 10.sup.4                                                                         3.1 × 10.sup.-3                                  4.5     7.1 × 10.sup.4                                                                         3.0 × 10.sup.-3                                  4       4.1 × 10.sup.4                                                                         1.7 × 10.sup.-3                                  3.5     3.3 × 10.sup.4                                                                         1.4 × 10.sup.-3                                  3       2.5 × 10.sup.3                                                                         1.1 × 10.sup.-4                                  2.5     1.4 × 10.sup.4                                                                         5.7 × 10.sup.-4                                  2       5.2 × 10.sup.3                                                                         2.2 × 10.sup.-4                                  (total input = 2.35 × 10.sup.8 phage).                                  ______________________________________                                    

                  TABLE 201                                                       ______________________________________                                        Elution of Bound Fusion Phage from Immobilized                                Active Trypsin                                                                                 Total Plaque-                                                                 Forming Units                                                                              Percent of                                      Type of          Recovered in Input Phage                                     Phage    Buffer  Elution Buffer                                                                             Recovered                                                                              Ratio                                  ______________________________________                                        BPTI-III MK                                                                            CBS     8.80 · 10.sup.7                                                                   4.7 · 10.sup.-1                                                               1675                                   MK       CBS     1.35 · 10.sup.6                                                                   2.8 · 10.sup.-4                        BPTI-III-MK                                                                            TBS     1.32 · 10.sup.8                                                                   7.2 · 10.sup.-1                                                               2103                                   MK       TBS     1.48 · 10.sup.6                                                                   3.4 · 10.sup.-4                        ______________________________________                                         The total input for BPTIIII MK phage was 1.85 · 10.sup.10            plaqueforming units while the input for MK phage was 4.65 ·          10.sup.11 plaqueforming units.                                           

                  TABLE 202                                                       ______________________________________                                        Elution of BPTI-III MK and BPTI(K15L)-III MA Phage from                       Immobilized Trypsin and HNE                                                                       Total Plaque-                                                       Immobil-  Forming Units                                                                             Percentage of                                 Type of   ized      in Elution  Input Phage                                   Phage     Protease  Fraction    Recovered                                     ______________________________________                                        BPTI-III  Trypsin   2.1 · 10.sup.7                                                                   4.1 · 10.sup.-1                      MK                                                                            BPTI-III  HNE       2.6 · 10.sup.5                                                                     5 · 10.sup.-3                      MK                                                                            BPTI(K15L)-                                                                             Trypsin   5.2 · 10.sup.4                                                                     5 · 10.sup.-3                      III MA                                                                        BPTI(K15L)-                                                                             HNE       1.0 · 10.sup.6                                                                   1.0 · 10.sup.-1                      III MA                                                                        ______________________________________                                         The total input of BPTIIII MK phage was 5.1 · 10.sup.9 pfu and       the input of BPTI(K15L)III MA phage was 9.6 · 10.sup.8 pfu.     

                  TABLE 203                                                       ______________________________________                                        Effect of pH on the Disociation of                                            Bound BPTI-III MK and                                                         BPTI(K15L)-III MA Phage from Immobilized HNE                                  BPTI-III MK        BPTI(K15L)-III MA                                               Total Plaque-                                                                             %         Total Plaque-                                                                           %                                             Forming Units                                                                             of Input  Forming Units                                                                           of Input                                 pH   in Fraction Phage     in Fraction                                                                             Phage                                    ______________________________________                                        7.0  5.0 · 10.sup.4                                                                   2 · 10-3                                                                       1.7 · 10.sup.5                                                                 3.2 · 10.sup.-2                 6.0  3.8 · 10.sup.4                                                                   2 · 10-3                                                                       4.5 · 10.sup.5                                                                 8.6 · 10.sup.-2                 5.0  3.5 · 10.sup.4                                                                   1 · 10-3                                                                       2.1 · 10.sup.6                                                                 4.0 · 10.sup.-1                 4.0  3.0 · 10.sup.4                                                                   1 · 10-3                                                                       4.3 · 10.sup.6                                                                 8.2 · 10.sup.-1                 3.0  1.4 · 10.sup.4                                                                   1 · 10-3                                                                       1.1 · 10.sup.6                                                                 2.1 · 10.sup.-1                 2.2  2.9 · 10.sup.4                                                                   1 · 10-3                                                                       5.9 · 10.sup.4                                                                 1.1 · 10.sup.-2                 Percentage of      percentage of                                              Input Phage = 8.0 10.sup.-3                                                                      Input Phage = 1.56                                         Recovered          Recovered                                                  ______________________________________                                         The total input of BPTIIII MK phage was 0.030 ml × (8.6 ·      10.sup.10 pfu/ml) = 2.6 · 10.sup.9.                                  The total input of BPTI(K15L)III MA phage was 0.030 ml × (1.7           · 10.sup.10 pfu/ml) = 5.2 · 10.sup.8.                       Given that the infectivity of BPTI(K15L)III MA phage is 5 fold lower than     that of BPTIIII MK phage, the phage inputs utilized above ensure that an      equivalent number of phage particles are added to the immobilized HNE.   

                  TABLE 204                                                       ______________________________________                                        Effect of Mutation of Residues 39 to 42 of BPTI                               on the ability of BPTI(K15L)-III MA to Bind to                                Immobilized HNE                                                               BPTI(K15L)-III MA  BPTI(K15L, MGNG)-III MA                                         Total Plaque-                                                                             %         Total Plaque-                                                                           %                                        pH   Forming Units                                                                             Input     Forming Units                                                                           Input                                    ______________________________________                                        7.0  3.0 · 10.sup.5                                                                    8.2 · 10-2                                                                    4.5 · 10.sup.5                                                                 1.63 · 10.sup.-1                6.0  3.6 · 10.sup.5                                                                   1.00 · 10.sup.-1                                                               6.3 · 10.sup.5                                                                 2.27 · 10.sup.-1                5.5  5.3 · 10.sup.5                                                                   1.46 · 10.sup.-1                                                               7.3 · 10.sup.5                                                                 2.64 · 10.sup.-1                5.0  5.6 · 10.sup.5                                                                   1.52 · 10.sup.-1                                                               8.7 · 10.sup.5                                                                 3.16 · 10.sup.-1                4.75 9.9 · 10.sup.5                                                                   2.76 · 10.sup.-1                                                               1.3 · 10.sup.6                                                                 4.60 · 10.sup.-1                4.5  3.1 · 10.sup.5                                                                    8.5 · 10.sup.-2                                                               3.6 · 10.sup.5                                                                 1.30 · 10.sup. -1               4.25 5.2 · 10.sup.5                                                                   1.42 · 10.sup.-1                                                               5.0 · 10.sup.5                                                                 1.80 · 10.sup.-1                4.0  5.1 · 10.sup.4                                                                    1.4 · 10.sup.-2                                                               1.3 · 10.sup.5                                                                  4.8 · 10.sup.-2                3.5  1.3 · 10.sup.4                                                                     4 · 10.sup.-3                                                                3.8 · 10.sup.4                                                                  1.4 · 10.sup.-2                Total              Total                                                      Percentage = 1.00  Percentage = 1.80                                          Recovered          Recovered                                                  ______________________________________                                         The total input of BPTI(K15L)III MA phage was 0.030 ml × (1.2           · 10.sup.10 pfu/ml) = 3.6 · 10.sup.8 pfu.                   The total input of BPTI(K15L, MGNG)III MA phage was 0.030 ml × (9.2     · 10.sup.9 pfu/ml) = 2.8 · 10.sup.8 pfu.               

                  TABLE 205                                                       ______________________________________                                        Fractionation of a Mixture of                                                 BPTI-III MK and                                                               BPTI(K15L, MGNG)-III MA Phage                                                 on Immobilized HNE                                                            BPTI-III MK        BPTI(K15L, MGNG)-III MA                                          Total                Total                                                    Kanamycin            Ampicillin                                               Transducing                                                                              %         Transducing                                                                            %                                         pH    Units      of Input  Units    of Input                                  ______________________________________                                        7.0   4.01 · 10.sup.3                                                                 4.5 · 10.sup.-3                                                                1.39 · 10.sup.5                                                               3.13 · 10.sup.-1                 6.0   7.06 · 10.sup.2                                                                   8 · 10.sup.-4                                                                7.18 · 10.sup.4                                                               1.62 · 10.sup.-1                 5.0   1.81 · 10.sup.3                                                                 2.0 · 10.sup.-3                                                                1.35 · 10.sup.5                                                               3.04 · 10.sup.-1                 4.0   1.49 · 10.sup.3                                                                 1.7 · 10.sup.-3                                                                7.43 · 10.sup.5                                                               1.673                                     ______________________________________                                         The total input of BPTIIII MK phage was 0.015 ml × (5.94 ·     10.sup.9 kanamycin transducing units/ml) = 8.91 · 10.sup.7           kanamycin transducing units.                                                  The total input of BPTI(K15L, MGNG)III MA phage was 0.015 ml ×          (2.96 · 10.sup.9 ampicillin transducing units/ml) = 4.44             · 10.sup.7 ampicillin transducing units.                        

                  TABLE 206                                                       ______________________________________                                        Characterization of the Affinity of                                           BPTI(K15V, R17L)-III MA Phage for Immobilized HNE                             BPTI(K15V, R17L)-III MA                                                                          BPTI(K15L, MGNG)-III MA                                         Total Plaque-                                                                             Percentage                                                                              Total Plaque-                                                                           Percentage                                    Forming Units                                                                             of Input  Forming Units                                                                           of Input                                 pH   Recovered   Phage     Recovered Phage                                    ______________________________________                                        7.0  3.19 · 10.sup.6                                                                   8.1 · 10.sup.-2                                                               9.42 · 10.sup.4                                                                4.6 · 10.sup.-2                 6.0  5.42 · 10.sup.6                                                                  1.38 · 10.sup.-1                                                               1.61 · 10.sup.5                                                                7.9 · 10.sup.-2                 5.0  9.45 · 10.sup.6                                                                  2.41 · 10.sup.-1                                                               2.85 · 10.sup.5                                                                1.39 · 10.sup.-1                4.5  1.39 · 10.sup.7                                                                  3.55 · 10.sup.-1                                                               4.32 · 10.sup.5                                                                2.11 · 10.sup.-1                4.0  2.02 · 10.sup.7                                                                  5.15 · 10.sup.-1                                                               1.42 · 10.sup.5                                                                6.9 · 10.sup.-2                 3.75 9.20 · 10.sup.6                                                                  2.35 · 10.sup.-1                                                               --        --                                       3.5  4.16 · 10.sup. 6                                                                 1.06 · 10.sup.-1                                                               5.29 · 10.sup.4                                                                2.6 · 10.sup.-2                 3.0  2.65 · 10.sup.6                                                                   6.8 · 10.sup.-2                                                               --        --                                       Total Input = 1.73 Total Input = 0.57                                         Recovered          Recovered                                                  ______________________________________                                         Total input of BPTI(K15V, R17L)III MA phage was 0.040 ml × (9.80        · 10.sup.10 pfu/ml) = 3.92 · 10.sup.9 pfu.                  Total input of BPTI(K15L, MGNG)III MA phage was 0.040 ml × (5.13        · 10.sup.9 pfu/ml) = 2.05 · 10.sup.8 pfu.              

                  TABLE 207                                                       ______________________________________                                        Sequence of the EpiNEα Clone Selected                                   From the Mini-Library                                                         ______________________________________                                        1      1     1       1   1      1   1     2   2                               3      4     5       6   7      8   9     0   1                               P      C     V       A   M      F   Q     R   Y                               CCT.TGC.GTG.GCT.ATG.TTC.CAA.CGC.TAT                                           (SEQ ID NO: 45)                                                               ______________________________________                                    

                  TABLE 208                                                       ______________________________________                                        SEQUENCES OF THE EpiNE CLONES                                                 IN THE P1 REGION                                                              CLONE                                                                         IDENTI-                                                                       FIERS   SEQUENCE                                                              ______________________________________                                        EpiNE3 (amino-acid: SEQ ID NO: 46)                                                    1      1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                            3, 9, 16,                                                                             P      C     V    G   F    F   S     R   Y                            17, 18, 19                                                                            CCT.TGC.GTC.GGT.TTC.TTC.TCA.CGC.TAT                                           (DNA: SEQ ID NO: 109)                                                 EpiNE6 (amino-acid: SEQ ID NO: 47)                                                           1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                            6       P      C     V    G   F    F   Q     R   Y                            CCT.TGC.GTC.GGT.TTC.TTC.CAA.CGC.TAT                                           (DNA: SEQ ID NO: 110)                                                         EpiNE7 (amino-acid: SEQ ID NO: 48)                                                    1      1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                            7, 13, 14                                                                             P      C     V    A   M    F   P     R   Y                            15, 20  CCT.TGC.GTC.GCT.ATG.TTC.CCA.CGC.TAT                                           (DNA: SEQ ID NO: 111)                                                 EpiNE4 (amino-acid: SEQ ID NO: 49)                                                    1      1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                            4       P      C     V    A   I    F   P     R   Y                            CCT.TGC.GTC.GCT.ATC.TTC.CCA.CGC.TAT                                           (DNA: SEQ ID NO: 112)                                                         EpiNE8 (amino-acid: SEQ ID NO: 50)                                                    1      1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                            8       P      C     V    A   I    F   K     R   S                            CCT.TGC.GTC.GCT.ATC.TTC.AAA.CGC.TCT                                           (DNA: SEQ ID NO: 113)                                                         EpiNE1 (amino-acid: SEQ ID NO: 51)                                                    1      1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                             1, 10  P      C     I    A   F    F   P     R   Y                            11, 12  CCT.TGC.ATC.GCT.TTC.TTC.CCA.CGC.TAT                                           (DNA: SEQ ID NO: 114)                                                 EpiNE5 (amino-acid: SEQ ID NO: 52)                                                    1      1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                            5       P      C     I    A   F    F   Q     R   Y                            CCT.TGC.ATC.GCT.TTC.TTC.CAA.CGC.TAT                                           (DNA: SEQ ID NO: 115)                                                         EpiNE2 (amino-acid: SEQ ID NO: 53)                                                    1      1     1    1   1    1   1     2   2                                    3      4     5    6   7    8   9     0   1                            2       P      C     I    A   L    F   K     R   Y                            CCT.TGC.ATC.GCT.TTG.TTC.AAA.CGC.TAT                                           (DNA: SEQ ID NO: 116)                                                         ______________________________________                                    

                                      TABLE 209                                   __________________________________________________________________________    DNA sequences and predicted amino acid                                        around the P1 region of BPTI analogues selected                               for binding to Cathepsin G.                                                   __________________________________________________________________________    Clone    P1                                                                            15  16  17  18  19                                                   BPTI     AAA .                                                                             GCG .                                                                             CGC .                                                                             ATC .                                                                             ATC                                                  (SEQ ID NO: 44)                                                                        LYS ALA ARG ILE ILE                                                  EpiC 1 (a)                                                                             ATG .                                                                             GGT .                                                                             TTC .                                                                             TCC .                                                                             AAA SEQ ID NO: 117                                   (SEQ ID NO: 54)                                                                        MET GLY PHE SER LYS                                                  EpiC 7   ATG .                                                                             GCT .                                                                             TTG .                                                                             TTC .                                                                             AAA SEQ ID NO: 118                                   (SEQ ID NO: 55)                                                                        MET ALA LEU PHE LYS                                                  EpiC 8 (b)                                                                             TTC .                                                                             GCT .                                                                             ATC .                                                                             ACC .                                                                             CCA SEQ ID NO: 119                                   (SEQ ID NO: 56)                                                                        PHE ALA ILE THR PRO                                                  EpiC 10  ATG .                                                                             GCT .                                                                             TTG .                                                                             TTC .                                                                             CAA SEQ ID NO: 120                                   (SEQ ID NO: 57)                                                                        MET ALA LEU PHE GLN                                                  EpiC 20  ATG .                                                                             GCT .                                                                             ATC .                                                                             TCC .                                                                             CCA SEQ ID NO: 121                                   (SEQ ID NO: 58)                                                                        MET ALA ILE SER PRO                                                  __________________________________________________________________________     (a) Clones 11 and 31 also had the identical sequence.                         (b) Clone 8 also contained the mutation Tyr 10 to ASN.                   

                                      TABLE 210                                   __________________________________________________________________________    Derivatives of EpiNE7 (SEQ ID NO:48) Obtained by Variegation at positions     34, 36, 39, 40 and 41                                                         __________________________________________________________________________    EpiNE7 (SEQ ID NO:48)                                                          ##STR696##                                                                    ##STR697##                                                                    ##STR698##                                                                    ##STR699##                                                                    ##STR700##                                                                    ##STR701##                                                                    ##STR702##                                                                    ##STR703##                                                                    ##STR704##                                                                    ##STR705##                                                                    ##STR706##                                                                    ##STR707##                                                                    ##STR708##                                                                    ##STR709##                                                                    ##STR710##                                                                    ##STR711##                                                                    ##STR712##                                                                    ##STR713##                                                                    ##STR714##                                                                    ##STR715##                                                                    ##STR716##                                                                    ##STR717##                                                                    ##STR718##                                                                    ##STR719##                                                                    ##STR720##                                                                    ##STR721##                                                                    ##STR722##                                                                    ##STR723##                                                                    ##STR724##                                                                    ##STR725##                                                                    ##STR726##                                                                    ##STR727##                                                                   __________________________________________________________________________     Notes:                                                                        a)   indicates variegated residue. * indicates imposed change. ↓       indicates carry over form EpiNE7.                                             b) The sequence M.sub.39 -GNG in EpiNE7 (indicated by *) was imposed to       increase similarity to ITID1.                                                 b) Lower case letters in EpiNE7.6 to 7.38 indicate changes from BPTI that     were selected in the first round (residues 15-19) or positions where the      PBD was variegated in the second round (residues 34, 36, 39, 40, and 41).     c) All EpiNE7 derivatives have G.sub.42.                                 

                  TABLE 211                                                       ______________________________________                                        Effects of antisera on phage infectifity                                      Phage                                                                         (dilution                                                                              Incubation             Relative                                      of stock)                                                                              Conditions    pfu/ml   Titer                                         ______________________________________                                        MA-ITI   PBS           1.2 · 10.sup.11                                                               1.00                                          (10.sup.-1)                                                                            NRS           6.8 · 10.sup.10                                                               0.57                                                   anti-ITI      1.1 · 10.sup.10                                                               0.09                                          MA-ITI   PBS           7.7 · 10.sup.8                                                                1.00                                          (10.sup.-3)                                                                            NRS           6.7 · 10.sup.8                                                                0.87                                                   anti-ITI      8.0 · 10.sup.6                                                                0.01                                          MA       PBS           1.3 · 10.sup.12                                                               1.00                                          (10.sup.-1)                                                                            NRS           1.4 · 10.sup.12                                                               1.10                                                   anti-ITI      1.6 · 10.sup.12                                                               1.20                                          MA       PBS           1.3 · 10.sup.10                                                               1.00                                          (10.sup.-3)                                                                            NRS           1.2 · 10.sup.10                                                               0.92                                                   anti-ITI      1.5 · 10.sup.10                                                               1.20                                          ______________________________________                                    

                  TABLE 212                                                       ______________________________________                                        Fractionation of EpiNE-7 and MA-ITI phage on HNE beads                                 EpiNE-7      MA-ITI                                                             Total pfu                                                                              Fraction  Total pfu                                                                            Fraction                                 Sample     in sample                                                                              of input  in sample                                                                            of input                                 ______________________________________                                        INPUT      3.3 · 10.sup.9                                                                1.00      3.4 · 10.sup.11                                                             1.00                                     Final      3.8 · 10.sup.5                                                                1.2 · 10.sup.-4                                                                1.8 · 10.sup.6                                                              5.3 · 10-6                      TBS-TWEEN                                                                     Wash                                                                          pH 7.0     6.2 · 10.sup.5                                                                1.8 · 10.sup.-4                                                                1.6 · 10.sup.6                                                              4.7 · 10.sup.-6                 pH 6.0     1.4 · 10.sup.6                                                                4.1 · 10.sup.-4                                                                1.0 · 10.sup.6                                                              2.9 · 10.sup.-6                 pH 5.5     9.4 · 10.sup.5                                                                2.8 · 10.sup.-4                                                                1.6 · 10.sup.6                                                              4.7 · 10.sup.-6                 pH 5.0     9.5 · 10.sup.5                                                                2.9 · 10.sup.-4                                                                3.1 · 10.sup.5                                                              9.1 · 10.sup.-7                 pH 4.5     1.2 · 10.sup.6                                                                3.5 · 10.sup.-4                                                                1.2 · 10.sup.5                                                              3.5 · 10.sup.-7                 pH 4.0     1.6 · 10.sup.6                                                                4.8 · 10.sup.-4                                                                7.2 · 10.sup.4                                                              2.1 · 10.sup.-7                 pH 3.5     9.5 · 10.sup.5                                                                2.9 · 10.sup.-4                                                                4.9 · 10.sup.4                                                              1.4 · 10.sup.-7                 pH 3.0     6.6 · 10.sup.5                                                                2.0 · 10.sup.-4                                                                2.9 · 10.sup.4                                                              8.5 · 10.sup.-8                 pH 2.5     1.6 · 10.sup.5                                                                4.8 · 10.sup.-5                                                                1.4 · 10.sup.4                                                              4.1 · 10.sup.-8                 pH 2.0     3.0 · 10.sup.5                                                                9.1 · 10.sup.-5                                                                1.7 · 10.sup.4                                                              5.0 · 10.sup.-8                 SUM*       6.4 · 10.sup.6                                                                3 · 10.sup.-3                                                                  5.7 · 10.sup.6                                                              2 · 10.sup.-5                   ______________________________________                                         *SUM is the total pfu (or fraction of input) obtained from 45all pH           elution fractions                                                        

                  TABLE 213                                                       ______________________________________                                        Fractionation of EpiC-10 and MA-ITI phage on Cat-G beads                               EpiC-10      MA-ITI                                                             Total pfu                                                                              Fraction  Total pfu                                                                            Fraction                                 Sample     in sample                                                                              of input  in sample                                                                            of input                                 ______________________________________                                        INPUT      5.0 · 10.sup.11                                                               1.00      4.6 · 10.sup.11                                                             1.00                                     Final      1.8 · 10.sup.7                                                                3.6 · 10.sup.-5                                                                7.1 · 10.sup.6                                                              1.5 · 10.sup.-5                 TBS-TWEEN                                                                     Wash                                                                          pH 7.0     1.5 · 10.sup.7                                                                3.0 · 10.sup.-5                                                                6.1 · 10.sup.6                                                              1.3 · 10.sup.-5                 pH 6.0     2.3 · 10.sup.7                                                                4.6 · 10.sup.-5                                                                2.3 · 10.sup.6                                                              5.0 · 10.sup.-6                 pH 5.5     2.5 · 10.sup.7                                                                5.0 · 10.sup.-5                                                                1.2 · 10.sup.6                                                              2.6 · 10.sup.-6                 pH 5.0     2.1 · 10.sup.7                                                                4.2 · 10.sup.-5                                                                1.1 · 10.sup.6                                                              2.4 · 10.sup.-6                 pH 4.5     1.1 · 10.sup.7                                                                2.2 · 10.sup.-5                                                                6.7 · 10.sup. 5                                                             1.5 · 10.sup.-6                 pH 4.0     1.9 · 10.sup.6                                                                3.8 · 10.sup.-6                                                                4.4 · 10.sup.5                                                              9.6 · 10.sup.-7                 pH 3.5     1.1 · 10.sup.6                                                                2.2 · 10.sup.-6                                                                4.4 · 10.sup.5                                                              9.6 · 10.sup.-7                 pH 3.0     4.8 · 10.sup.5                                                                9.6 · 10.sup.-7                                                                3.6 · 10.sup.5                                                              7.8 · 10.sup.-7                 pH 2.5     2.0 · 10.sup.5                                                                4.0 · 10.sup.-7                                                                2.7 · 10.sup.5                                                              5.9 · 10.sup.-7                 pH 2.0     2.4 · 10.sup.5                                                                4.8 · 10.sup.-7                                                                3.2 · 10.sup.5                                                              7.0 · 10.sup.-7                 SUM*       9.9 · 10.sup.7                                                                2 · 10.sup.-4                                                                  1.4 · 10.sup.7                                                              3 · 10.sup.-5                   ______________________________________                                         *SUM is the total pfu (or fraction of input) obtained from all pH elution     fractions                                                                

                  TABLE 214                                                       ______________________________________                                        Abbreviated fractionation of display phage on HNE beads                       DISPLAY PHAGE                                                                 EpiNE-7     MA-ITI 2   MA-ITI-E7 1                                                                              MA-ITI-E7 2                                 ______________________________________                                        INPUT  1.00     1.00       1.00     1.00                                      (pfu)  (1.8 · 10.sup.9)                                                              (1.2 · 10.sup.10)                                                               (3.3 · 10.sup.9)                                                              (1.1 · 10.sup.9)                 WASH   6 · 10.sup.-5                                                                 1 · 10.sup.-5                                                                   2 · 10.sup.-5                                                                 2 · 10.sup.-5                    pH 7.0 3 · 10.sup.-4                                                                 1 · 10.sup.-5                                                                   2 · 10.sup.-5                                                                 4 · 10.sup.-5                    pH 3.5 3 · 10.sup.-3                                                                 3 · 10.sup.-6                                                                   8 · 10.sup.-5                                                                 8 · 10.sup.-5                    pH 2.0 1 · 10.sup.-3                                                                 1 · 10.sup.-6                                                                   6 · 10.sup.-6                                                                 2 · 10.sup.-5                    SUM*   4.3 · 10.sup.-3                                                               1.4 · 10.sup.-5                                                                 1.1 · 10.sup.-4                                                               1.4 · 10.sup.-4                  ______________________________________                                         *SUM is the total fraction of input pfu obtained from all pH elution          fractions                                                                

                  TABLE 215                                                       ______________________________________                                        Fractionation of EpiNE-7 and MA-ITI-E7 phage on HNE beads                              EpiNE-7      MA-ITI-E7                                                          Total pfu                                                                              Fraction  Total pfu                                                                            Fraction                                 Sample     in sample                                                                              of input  in sample                                                                            of input                                 ______________________________________                                        INPUT      1.8 · 10.sup.9                                                                1.00      3.0 · 10.sup.9                                                              1.00                                     pH 7.0     5.2 · 10.sup.5                                                                2.9 · 10.sup.-4                                                                6.4 · 10.sup.4                                                              2.1 · 10.sup.-5                 pH 6.0     6.4 · 10.sup.5                                                                3.6 · 10.sup.-4                                                                4.5 · 10.sup.4                                                              1.5 · 10.sup.-5                 pH 5.5     7.8 · 10.sup.5                                                                4.3 · 10.sup.-4                                                                5.0 · 10.sup.4                                                              1.7 · 10.sup.-5                 pH 5.0     8.4 · 10.sup.5                                                                4.7 · 10.sup.-4                                                                5.2 · 10.sup.4                                                              1.7 · 10.sup.-5                 pH 4.5     1.1 · 10.sup.6                                                                6.1 · 10.sup.-4                                                                4.4 · 10.sup.4                                                              1.5 · 10.sup.-5                 pH 4.0     1.7 · 10.sup.6                                                                9.4 · 10.sup.-4                                                                2.6 · 10.sup.4                                                              8.7 · 10.sup.-6                 pH 3.5     1.1 · 10.sup.6                                                                6.1 · 10.sup.-4                                                                1.3 · 10.sup.4                                                              4.3 · 10.sup.-6                 pH 3.0     3.8 · 10.sup.5                                                                2.1 · 10.sup.-4                                                                5.6 · 10.sup.3                                                              1.9 · 10.sup.-6                 pH 2.5     2.8 · 10.sup.5                                                                1.6 · 10.sup.-4                                                                4.9 · 10.sup.3                                                              1.6 · 10.sup.-6                 pH 2.0     2.9 · 10.sup.5                                                                1.6 · 10.sup.-4                                                                2.2 · 10.sup.3                                                              7.3 · 10.sup.-7                 SUM*       7.6 · 10.sup.6                                                                4.1 · 10.sup.-3                                                                3.1 · 10.sup.5                                                              1.1 · 10.sup.-4                 ______________________________________                                         *SUM is the total pfu (or fraction of input) obtained from all pH elution     fractions                                                                

CITATIONS

AKOH72: Ako, H, RJ Foster, and CA Ryan, "The preparation ofanhydro-trypsin and its reactivity with naturally occurring proteinaseinhibitors", Biochem Biophys Res Commun (USA)(1972), 47(6)1402-7.

ALBR83a: Albrecht, G, K Hochstrasser, and OL Schonberger, "Kunitz-typeproteinase inhibitors derived by limited proteolysis of theinter-α-trypsin inhibitor, IX: isolation and characterization of theinhibitory parts of inter-α-trypsin inhibitors from several mammaliansera", Hoppe-Seyler's Z Physiol Chem (1983), 364:1697-1702.

ALBR83b: Albrecht, GJ, K Hochstrasser, and J-P Salier, "Elastaseinhibition by the inter-α-trypsin inhibitor and derived inhibitors ofman and cattle", Hoppe-Seyler's Z Physiol chem (1983), 364:1703-1708.

ALMA83a: Almassy, RC, JC Fontecilla-Camps, FL Suddath, and CE Bugg,"Structure of scorpion neurotoxin at 1.8 Å resolution", Entry ISN3 inBrookhaven Protein Data Bank, (1983).

ALMA83b: Almassy, RC, JC Fontecilla-Camps, FL Suddath, and CE Bugg,"Structure of variant-3 scorpion neurotoxin from CentruroidesSculpturatus ewing refined at 1.8 Å resolution", J Mol Biol (1983),170:497ff.

ALMQ89: Almquist, RG, SR Kadambi, DM Yasuda, FL Weitl, WE Polgar, and LRToll, "Paralytic activity of (des-Glul)conotoxin GI analogs in the mousediaphragm", Int J Pept Protein Res, (Dec 1989), 34(6)455-62.

ANFI73: Anfinsen, CB, "Principles that govern the folding of proteinchains", Science (1973), 181(96)223-30.

ARG087: Argos, P, "Analysis of Sequence-similar Pentapeptides inUnrelated Protein Tertiary Structures", J Mol Biol (1987), 197:331-348.

ARAK90: Araki, K, M Kuwada, 0 Ito, J Kuroki, and S Tachibana, "Fourdisulfide bonds allocation of Na⁺, K⁺ -ATPase inhibitor (SPAI)", BiochemBiophys Res Comm (1990), 172(1)42-46.

ARMS81: Armstrong, J, RN Perham, and JE Walker, "Domain structure ofBacteriophage fd Adsorption Protein", FEBS Lett (1981), 135(1)167-172.

ARMS83: Armstrong, J, JA Hewitt, and RN Perham, "Chemical modificationof the coat protein in bacteriophage fd and orientation of the virionduring assembly and disassembly", EMBO J (1983), 2(10)1641-6.

ARNA90: Arnaout, MA, "Leukocyte Adhesion Molecules Deficiency: ItsSTructural Basis, Pathophysiology and Implications for Modulating theInflammatory Response", Immunological Reviews (1990), 114:.

AUER87: Auerswald, E-A, W Schroeder, and M Kotick, "Synthesis, Cloningand Expression of Recombinant Aprotinin", Biol Chem Hoppe-Seyler (1987),368:1413-1425.

AUER88: Auerswald, E-A, D Hoerlein, G Reinhardt, W Schroder, and ESchnabel, "Expression Isolation, and Characterization of Recombinant[Arg¹⁵,Glu⁵² ]Aprotinin", Bio Chem Hoppe-Seyler (1988),369(Supplement):27-35.

AUER89: Auerswald, E-A, W Bruns, D Hoerlein, G Reinhardt, E Schnabel,and W Schroder, "Variants of bovine pancreatic trypsin inhibitorproduced by recombinant DNA technology", UK Patent Application GB2,208,511 A.

AUER90: Auerswald, E-A, W Schroeder, E Schnabel, W Bruns, G Reinhard,and M Kotick, "Homologs of Aprotinin produced from a recombinant host,process ecpression vector and recombinant host therefor andpharmaceutical use thereof", U.S. Pat. No. 4,894,436 (16 Jan 1990).

AUSU87: Ausubel, FM, R Brent, RE Kingston, DD Moore, JG Seidman, JASmith, and K Struhl, Editors Current Protocols in Molecular Biology,Greene Publishing Associates and Wiley-Interscience, Publishers: JohnWiley & Sons, New York, 1987.

BAKE87: Baker, K, N Mackman, and IB Holland, "Genetics and Biochemistryof the Assembly of Proteins into the Outer Membrane of E. coli", ProgBiophys molec Biol (1987), 49:89-115.

BALD85: Balduyck, M, M Davril, C Mizon, M Smyrlaki, A Hayem, and JMizon, "Human urinary proteinase inhibitor: inhibitory properties andinteraction with bovine trypsin", Biol Chem Hoppe-Seyler (1985),366:9-14.

BANN81: Banner, DW, C Nave, and DA Marvin, "Structure of the protein andDNA in fd filamentous bacterial virus", Nature (1981), 289:814-816.

BARB85: Barbe, J, JA Vericat, M Llagostera, and R Guerrero, "Expressionof the SOS genes of Escherichia coli in Salmonella typhimurium",Microbiologia (1985), 1(1-2)77-87.

BECK80: Beck, E, "Nucleotide sequence of the gene ompA coding the outermembrane protein II* of Escherichia coli K-12", Nucl Acid Res (1980),8(13)3011-3024.

BECK83: Beckwith, J, and TJ Silhavy, "Genetic Analysis of Protein Exportin Escherichia coli", Methods in Enzymology (1983), 97:3-11.

BECK88b: Beckmann, J, A Mehlich, W Schroeder, HR Wenzel, and HTschesche, "Preparation of chemically `mutated` aprotinin homologues bysemisynthesis: Pl substitutions change inhibitory specificity", Eur JBiochem (1988), 176:675-82.

BECK89a: Beckmann, J, A Mehlich, W Schroeder, HR Wenzel, and HTschesche, "Semisynthesis of Arg¹⁵, Glu¹⁵, Met¹⁵, and Nle¹⁵ -AprotininInvolving Enzymatic Peptide Bond Resynthesis", J Protein Chem (1989),8(1)101-113.

BECK89b: Becker, S, E Atherton, H Michel, and RD Gordon, "Synthesis andcharacterization of conotoxin IIIa", J Protein Chem, (Jun 1989),8(3)393-4.

BECK89c: Becker, S, E Atherton, and RD Gordon, "Synthesis andcharacterization of mu-conotoxin IIIa", Eur J Biochem, (Oct 20 1989),185(1)79-84.

BENS84: Benson, SA, E Bremer, and TJ Silhavy, "Intragenic regionsrequired for LamB export", Proc Natl Acad Sci USA (1984), 81:3830-34.

BENS87b: Benson, SA, and E Bremer, "In vivo selection andcharacterization of internal deletions in the lamB::lacZ gene fusion",Gene (1987), 52(2-3)165-73.

BENS87c: Benson, SA, MN Hall, and BA Rasmussen, "Signal SequenceMutations That Alter Coupling of Secretion and Translation of anEscherichia coli Outer Membrane Protein", J Bacteriol (1987),169(10)4686-91.

BENS88: Benson, SA, JL Occi, BA Sampson, "Mutations that alter the porefunction of the OmpF porin of Escherichia coli K12", J Mol Biol (1988)203(4)961-70.

BENZ88a: Benz, R, and K Bauer, "Permeation of hydrophilic molceulesthrough the outer membrane of gram-negative bacteria", Eur J Biochem(1988), 176:1-19.

BENZ88b: Benz, R, "Structure and Fucntion of Porins from Gram-NegativeBacteria", Ann Rev Microbiol (1988), 42:359-93.

BERG88: Berg, JM, "Proposed structure for the zinc-binding domains fromtranscription factor IIIA and related proteins", Proc Natl Acad Sci USA(1988), 85:99-102.

BETT88: Better, M, CP Chang, RR Robinson, and AH Horwitz, "Escherichaicoli Secretion of an Active Chimeric Antibody Fragment", Science (1988),240:1041-1043.

BHAT86: Bhatnagar, PK, and JC Frantz, "Synthesis and Antigenic activityof E. coli ST and its analogues", Develop biol Standard (1986),63:79-87.

BIRD67: Birdsell, DC, and EH Cota-Robles, "Production and Ultrastructureof lysozyme and ethylenediaminetetraacetate-lysozyme spheroplasts of E.coli", J Bacteriol (1967), 93:427-437.

BIET86: Bieth, JG, "Elastase: Catalytic and Biological Properties", pp.217-320 in Regulation of Matrix Accumulation, Editor: RP Mecham,Academic Press, Orlando, 1986.

BLOW72: J Mol Biol (1972), 69:137ff.

BODE89: Bode, W, HJ Greyling, R Huber, J Otlewski, and T Wilusz, "Therefined 2.0 A X-ray crystal structure of the complex formed betweenbovine beta-trypsin and CMTI-I, a trypsin inhibitor from squash seeds(Cucurbita maxima). Topological similarity of the squash seed inhibitorswith the carboxypeptidase A inhibitor from potatoes", FEBS Lett (Jan 21989), 242(2)285-92.

BOEK80: Boeke, JD, M Russel, and P Model, "Processing of FilamentousPhage Pre-coat Protein: Effect of Sequence Variations near the SignalPeptidase Cleavage Site", J Mol Biol (1980), 144:103-116.

BOEK82: Boeke, JD, P Model, and ND Zinder, "Effects fo Bacteriophage flGene III Protein on the Host Cell Membrane", Molec and Gen Genet,(1982), 186:185-192.

BOQU87: Boquet, PL, C Manoil, and J Beckwith, "Use of TnphoA to DetectGenes for Exported Proteins in Escherichia coli: Identification of thePlasmid-Encoded Gene for a Periplasmic Acid Phosphatase", J Bacteriol(1987), 169:1663-1669.

BOTS85: Botstein, D, and D Shortle, "Strategies and applications of invitro mutagenesis", Science, (1985), 229(4719)1193-201.

BOUG84: Bouges-Bocquet, B, H Villarroya, and M Hofnung, "LinkerMutagenesis in the Gene of an Outer Membrane Protein of Escherichiacoli, LamB", J Cellular Biochem (1984), 24:217-28.

BOUL86a: Boulain, JC, A Charbita and M Hofnung, "Mutagenesis by randomlinker insertion into the lamB gene of Escherichia coli K12", Mol GenGenet, (1986), 205(2)339-48.

BRAW87: Brawerman, G, "Determinants of messenger RNA stability", Cell(1987), 48(1)5-6.

CALA90: Calamia, J, and C Manoil, "lac permease of Escherichia coli:topology and sequence elements promoting membrane insertion", Proc NatlAcad Sci USA, (Jul 1990), 87(13)4937-41.

CAMP90: Campanelli, D, M Melchior, Yiping Fu, M Nakata, H Shuman, CNathan, and JE Gabay, "Cloning of cDNA for Proteinase 3: A SerineProtease, Antibiotic, and Autoantigen from Human Neutrophils", J Exp Med(Dec 1990), 172:1709-15.

CARM90: Carmel, G, D Hellstern, D Henning, and JW Coulton, "Insertionmutagenesis of the gene encoding the ferrichrome-iron receptor ofEscherichia coli K-12", J Bacteriol, (Apr 1990), 172(4)1861-9.

CARU85: Caruthers, MH, "Gene Synthesis Machines: DNA Chemistry and ItsUses", Science (1985), 230:281-285.

CARU87: Caruthers, MH, P Gottlieb, LP Bracco, and L Cummings, "TheThymine 5-Methyl Group: A Protein-DNA Contact Site Useful forRedesigning Cro Repressor to Recognize a New Operator", in ProteinStructure, Folding, and Design 2, 1987, Ed. D Oxender (New York, AR LissInc) p.9ff.

CAST79: Castillo, MJ, K Nakajima, M Zimmerman, and JC Powers, "Sensitivesubstrates for human leukocyte and porcine pancreatic elastase: a studyof the merits of various chromophoric and fluorogenic leaving groups inassays for serine proteases", Anal Biochem (1979), 99(1)53-64.

CATR87: Catron, KM, and CA Schnaitman, "Export of Protein in Escherichiacoli: a Novel Mutation in ompC Affects Expression of Other Major OuterMembrane Proteins", J Bacteriol (1987), 169:4327-34.

CHAM82: Chambers, RW, I Kucan, and Z Kucan, "Isolation andcharacterization of phi-X174 mutants carrying lethal missense mutationsin gene G", Nucleic Acids Res (1982), 10(20)6465-73.

CHAN79: Chang, CN, P Model, and G Blobel, "Membrane biogenesis:Cotranslational integration of the bacteriophage fl coat protein into anEscherichia coli membrane fraction", Proc Natl Acad Sci USA (1979),76:1251-1255.

CHAP90: Chapot, MP, Y Eshdat, S Marullo, JG Guillet, A Charbit, ADStrosberg, and C Delavier-Klutchko, "Localization and characterizationof three different beta-adrenergic receptors expressed in Escherichiacoli", Eur J Biochem (1990), 187(1)137-44.

CHAR84: Charbit, A, J-M Clement, and M Hofnung, "Further SequenceAnalysis of the Phage Lambda Receptor Site", J Mol Biol (1984),175:395-401.

CHAR86a: Charbit, A, JC Boulain, A Ryter, and M Hofnung, "Probing thetopology of a bacterial membrane protein by genetic insertion of aforeign epitope; expression at the cell surface", EMBO J, (1986),5(11)3029-37.

CHAR86b: Charbit, A, J-C Boulain, and M Hofnung, "Une methode genetiquepur exposer un epitope choisi a 1a surface de la bacteria Escherichiacoli. Perspectives [A genetic method to expose a chosen epitope on thesurface of the bacteria E. coli]", Comptes Rendu Acad Sci, Paris,(1986), 302:617-24.

CHAR87: Charbit, A, E Sobczak, ML Michel, A Molla, P Tiollais, and MHofnung, "Presentation of two epitopes of the preS2 region of hepatitisB virus on live recombinant bacteria", J Immunol (1987), 139:1658-64.

CHAR88a: Charbit, A, K Gehring, H Nikaido, T Ferenci, and M Hofnung,"Maltose transport and starch binding in phage-resistant point mutantsof maltoporin. Functional and topological implications", J Mol Biol(1988), 201(3)487-96.

CHAR88b: Charbit, A, A Molla, W Saurin, and M Hofnung, "Versatility of avector for expressing foreign polypeptides at the surface ofgram-negative bacteria", Gene (1988), 70(1)181-9.

CHAR88c: Charbit, A, S Van der Werf, V Mimic, JC Boulain, M Girard, andM Hofnung, "Expression of a poliovirus neutralization epitope at thesurface of recombinant bacteria: first immunization results", Ann InstPasteur Microbiol (1988), 139(1)45-58.

CHAR90: Charbit, A, A Molla, J Ronco, JM Clement, V Favier, EM Bahraoui,L Montagnier, A Leguern, and M Hofnung, "Immunogenicity and antigenicityof conserved peptides from the envelope of HIV-1 expressed at thesurface of recombinant bacteria", AIDS (1990), 4(6)545-51.

CHAV88: Chavrier, P, P Lemaire, 0 Revelant, R Bravo, and P Charnay,"Characterization of a Mouse Multigene Family That Encodes Zinc FingerStructures", Molec Cell Biol (1988), 8(3)1319-26.

CHAZ85: Chazin, WJ, DP Goldenberg, TE Creighton, and K Wuthrich,"Comparative studies of conformation and internal mobility in native andcircular basic pancreatic trypsin inhibitor by ¹ H nuclear magneticresonance in solution", Eur J Biochem (1985), 152:(2)429-37.

CHOT75: Chothia, C, and J Janin, "Principles of protein-proteinrecognition", Nature (1975), 256:705-708.

CHOT76: Chothia, C, S Wodak, and J Janin, "Role of subunit interfaces inthe allosteric mechanism of hemoglobin", Proc Natl Acad Sci USA (1976),73:3793-7.

CHOU74: Chou, PY, and GD Fasman, "Prediction of proteinconformation"Biochemistry (1974), 13:(2)222-45.

CHOU78a: Chou, PY, and GD Fasman, "Prediction of the secondary structureof proteins from their amino acid sequence", Adv Enzymol (1978),47:45-148.

CHOU78b: Chou, PY, and GD Fasman, "Empirical predictions of proteinconformation" Annu Rev Biochem (1978), 47:251-76.

CHOW87: ChoWdhuury, K, U Deutsch, and P Gruss, "A Multigene FamilyEncoding Several `Finger` Structures Is Present and DifferentiallyActive in Mammalian Genomes", Cell (1987), 48:771-778.

CLEM81: Clement, JM, and M Hofnung, "The sequence of the lambdareceptor, an outer membrane protein of E. coli K12", Cell (1981),27:507-514.

CLEM83: Clement JM, E Lepouce, C Marchal, and M Hofnung, "Genetic Studyof a membrane protein: DNA sequence alterations due to 17 LamB pointmutations affecting adsorption of phage lambda", EMBO J (1983), 2:77-80.

CLIC88: Click, EM, GA McDonald, and CA Schnaitman, "TranslationalControl of Exported Proteins That Results from OmpC PorinOverexpression", J Bacteriol (1988), 170:2005-2011.

CLOR86: Clore, GM, AT Brunger, M Karplus, AM Gronenborn, "Application ofMolecular Dynamics with Interproton Distance Restraints toThree-dimensional Protein Structure Determination: A model study ofCrambin", J Mol Biol (1986), 191:523-551.

CLOR87a: Clore, GM, AM Gronenborn, M Kjaer, and FM Poulsen, "Thedetermination of the three-dimensional structure of barley serineproteinase inhibitor 2 by nuclear magnetic resonance distance geometryand restrained molecular dynamics", Protein Engineering (1987),1(4)305-311.

CLOR87b: Clore, GM, AM Gronenborn, MNG James, M Kjaer, CA McPhalen, andFM Poulsen, "Comparison of the solution and X-ray structures of barleyserine proteinase inhibitor 2", Protein Engineering (1987), 1(4)313-318.

CLUN84: Clune, A, K-S Lee, and T Ferenci, "Affinity Engineering ofMaltoporin: Variants with Enhanced Affinity for Particular Ligands",Biochem and Biophys Res Comm (1984), 121:34-40.

CREI74: Creighton, TE, "Intermediates in the Refolding of ReducedPancreatic Trypsin Inhibitor", J Mol Biol (1974), 87:579-602.

CREI77a: Creighton, TE, "Conformational Restrictions on the Pathway ofFolding and Unfolding of the Pancreatic Trypsin Inhibitor", J Mol Biol(1977), 113:275-293.

CREI77b: Creighton, TE, Energetics of Folding and Unfolding ofPancreatic Trypsin Inhibitor", J Mol Biol (1977), 113:295-312.

CREI80: Creighton, TE, "Role of the Environment in the Refolding ofReduced Pancreatic Trypsin Inhibitor", J Mol Biol (1980), 144:521-550.

CREI84: Creighton, TE, Proteins: Structures and Molecular Principles, WH Freeman & Co, New York, 1984.

CREI87: Creighton, TE, and IG Charles, "Biosynthesis, Processing, andEvolution of Bovine Pancreatic Trypsin Inhibitor", Cold Spring Harb SympQuant Biol (1987), 52:511-519.

CREI88: Creighton, TE, "Disulphide Bonds and Protein Stability",BioEssays (1988), 8(2)57-63.

CRIS84: Crissman, JW, and GP Smith, "Gene-III Protein of FilamentousPhages: Evidence for a Carboxyl-Terminal Domain with a Role inMorphogenesis", Virology (1984), 132:445-55.

CRUZ85: Cruz, LJ, WR Gray, BM Olivera, RD Zeikus, L Kerr, D Yoshikami,and E Moczydlowski, "Conus geographus toxins that discriminate betweenneuronal and muscle sodium channels", J Biol Chem, (1985),260(16)9280-8.

CRUZ89: Cruz, LJ, G Kupryszewski, GW LeCheminant, WR Grey, BM Oliveria,and J Rivier, "mu-Conotoxin GIIIA, a Peptide Ligand for Muscle ScodiumChannels: Chemical Synthesis, Radiolabeling, and ReceptorCharacterization", Biochem (1989), 28:3437-3442.

CWIR90: Cwirla, SE, EA Peters, RW Barrett, and WJ Dower, "Peptides onPhage: A vast library of peptides for identifying ligands", Proc NatlAcad Sci USA, (August 1990), 87:6378-6382.

DAIL90: Dailey, D, GL Schieven, MY Lim, H Marquardt, T Gilmore, JThorner, and GS Martin, "Novel yeast protein kinase (YPKl gene product)is a 40-kilodalton phosphotyrosyl protein associated withprotein-tyrosine kinase activity", Mol Cell Biol (Dec 1990),10(12)6244-56.

DALL90: Dallas, WS, "The Heat-Stable Toxin I Gene from Escherichia coli8D", J Bacteriol (1990), 172(9)5490-93.

DARG88: Dargent, B, A Charbit, M Hofnung, and F Pattus, "Effect of pointmutations on the in-vitro pore properties of maltoporin, a protein ofEscherichia coli outer membrane", J Mol Biol (1988), 201(3)497-506.

DAWK86: Dawkins, R, The Blind Watchmaker, W W Norton & Co, New York,1986.

DAYL88: Day, LA, CJ Marzec, SA Reisberg, and A Casadevall, "DNA Packingin Filamentous Bacteriophage", Ann Rev Biophys Biophys Chem (1988),17:509-39.

DAYR86: Dayringer, H, A Tramantano, and R Fletterick, "Proteus Softwarefor Molecular Modeling" p.5-8 in Computer Graphics and MolecularModelinq, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1986.

DEBR86: Debro, L, PC Fitz-James, and A Aronson, "Two differentparasporal inclusions are produced by Bacillus thurinoiensis subsp.finitimus.", J Bacteriol (1986), 165:258-68.

DEGE84: de Geus, P, HM Verheij, NH Reigman, WPM Hoekstra, and GH deHaas, "The pro- and mature forms of the E. coli K-12 outer memberanephospholipase A are identical", EMBO J (1984), 3(8)1799-1802.

DEGR87: DeGrado, WF, L Regan, and SP Ho, "The Design of a Four-helixBundle Protein", Cold Spring Harbor Symp Quant Biol, (1987), 52:521-6.

DELA88: de la Cruz, VF, AA Lal and TF McCutchan, "Immunogenicity andepitope mapping of foreign sequences via genetically engineeredfilamentous phage", J Biol Chem, (1988), 263(9)4318-22.

DENH78: Denhardt, DT, D Dressler, and DS Ray editors, TheSingle-Stranded DNA Phages, Cold Spring Harbor Laboratory, 1978.

DEVL90: Devlin, JJ, LC Panganiban, and PE Devlin, "Random PeptideLibraries: A Source of Specific Protein Binding Molecules", Science, (27July 1990), 249:404-406.

DEV078: DeVore, DP, and RJ Gruebel, "Dityrosine in adhesive formed bythe sea mussel, Mytilus edulis", Biochem Biophys Res Commun (1978),80(4)993-9.

DEVR84: de Vries, G, CK raymond, and RA Ludwig, "Extension ofbacteriophage λ host range: Selection, cloning, and characterization ofa constitutive λ receptor gene", Proc Natl Acad Sci USA (1984),81:6080-4.

DIAR90: Diarra-Mehrpour, M, J Bourguignon, R Sesboue, J-P Salier, TLeveillard and J-P Martin, "Structural analysis of the humaninter-α-trypsin inhibitor light-chain gene", Eur J Biochem (1990),191:131-139.

DICK83: Dickerson, RE, and I Geis, Hemoglobin: Structure, Function,Evolution, and Pathology, The Bejamin/Cummings Publishing Co, MenloPark, Calif., 1983.

DILL87: Dill, KA, "Protein Surgery", Protein Engineering (1987),1:369-371.

DOUG84: Dougan, G, and P Morrissey, "Molecular analysis of the virulencedeterminants of enterotoxigenic Escherichia coli isolated from domesticanimals: applications for vaccine development", Vet Microbiol (1984/5),10:241-57.

DON087 Donovan, W, Z Liangbiao, K Sandman, and R Losick, "Genes EncodingSpore Coat Polypeptides from Bacillus subtilis", J Mol Biol (1987),196:1-10.

DUCH88: Duchene, M, A Schweized, F Lottspeich, G Krauss, M Marget, KVogel, B-U von Specht, and H Domdey, "Sequence and Transcriptional StartSite of the Pseudomonas aeruoinosa Outer Membrane Porin Protein F Gene",J Bacteriol (1987), 170:155-162.

DUFT85: Dufton, MJ, "Proteinase inhibitors and dendrotoxins", Eur JBiochem (1985), 153:647-654.

DULB86: Dulbecco, R, "Viruses with Recombinant Surface Proteins", U.S.Pat. No. 4,593,002, Jun. 3, 1986.

DUPL88: Duplay, P, and M Hofnung, "Two Regions of Mature PeriplasmicMaltose-Binding Protein of Escherichia coli Involved in Secretion", JBacteriol (1988), 170(10)4445-50.

DWAR89: Dwarakanath, P, SS Viswiswariah, YVBK Subrahmanyam, G Shanthi,HM Jagannatha, and TS Balganesh, "Cloning and hyperexpression of a geneencoding the heatstable toxin of Escherichia coli", Gene (1989),81:219-226.

EHRM90: Ehrmann, M, D Boyd, and J Beckwith, "Genetic analysis ofmembrane protein topology by a sandwich gene fusion approach", Proc NatlAcad Sci USA, (Oct 1990), 87(19)7574-8.

EIGE90: Eigenbrot, C, M Randal, and AA Kossiakoff, "Structural effectsinduced by removal of a disulfidebridge: the X-ray structure of theC30A/C51A mutant of basic pancreatic trypsin inhibitor at 1.6 Å",Protein Engineering (1990), 3(7)591-598.

EISE85: Eisenbeis, SJ, MS Nasoff, SA Noble, LP Bracco, DR Dodds, MHCaruthers, "Altered Cro Repressors from engineered mutagenesis of asynthetic cro gene", Proc Natl Acad Sci USA (1985), 82:1084-1088.

ELLE88: Elleman, TC, "Pilins of Bacteroides nodosus: molecular basis ofserotypic variation and relationships to other bacterial pilins",Microbiol Rev (1988), 52(2)233-47.

EMPI82: Empie, MW, and M Laskowski, Jr, "Thermodynamics and Kinetics foSingle Residue Replacements in Avian Ovomucoid Third Domains: Effect onInhibitor Interactions with Serine Proteinases", Biochemistry (1982),21:2274-84.

ENGH89: Enghild, JJ, IB Thogersen, SV Pizzo, and G Salvesen, "Anallysisof inter-α-trypsin inhibitor and a novel inhibitor, pre-α-trypsininhibitor, from human plasma: polypeptide chain stoichiometry andassembly by glycan", J Biol Biochem (1989), 264:15975-15981.

EPST63: Epstein , CJ, RF Goldberger, and CB Anfinsen, Cold Spr Ha.b™SympQuant Biol (1963), 28:439ff.

ERIC86: Erickson, BW, SB Daniels, PA Reddy, CG Unson, JS Richardson, andDC Richardson, "Betabellin: An Engineered Protein", CurrentCommunications in Molecular Biology:Computer Graphics and MolecularModeling, Cold Spring Harbor Laboratoary, Cold Spring Harbor, N.Y.,1986, Fletterick, R and M Zoller, Editors.

EVAN88: Evans, RM, and SM Hollenberg, "Zinc Fingers: Gilt byAssociation", Cell (1988), 52:1-3.

FAVE89: Favel, A, D Le-Nguyen, MA Coletti-Previero, and C Castro,"Active site chemical mutagenesis of Ecbalium elaterium TrypsinInhibitor II: New microproteins inhibiting elastase and chymotrypsin",Biochem Biophys Res Comm (1989), 162:79-82.

FERE80c: Ferenci, T, "The recognition of maltodextrins by Escherichiacoli", Eur J Biochem (1980), 108:631-6.

FERE82a: Ferenci, T, "Affinity-chromatographic Studies based on theBindingspecificity of the Lambda Receptor of Escherichia coli", AnnMicrobiol (Inst Pasteur) (1982), 133A:167-169.

FERE82b Ferenci, T, and K-S Lee, "Directed Evolution of the LambdaReceptor of Escherichia coli through Affinity ChromatographicSelection", J Mol Biol (1982), 160:431-444.

FERE83: Ferenci, T, and KS Lee, "Isolation by affinity chromatography,of mutant Escherichia coli cells with novel regulation of lamBexpression", J Bacteriol (1983), 154:984-987.

FERE84: Ferenci, T, "Genetic manipulation of bacterial surfaces throughaffinity-chromatographic selection", Trends in Biological Science (1984)Vol. ?:44-48.

FERE86a: Ferenci, T, and K-S Lee, "Temperature-Sensitive Binding ofα-Glucans by Bacillus stearothermophilus", J Bacteriol (1986),166:95-99.

FERE86b: Ferenci, T, M Muir, K-S Lee, and D Maris, "Substratespecificity of the Escherichia coli maltodextrin transport system andits component proteins.", Biochimica et Biophysica Acta (1986),860:44-50.

FERE89a: Ferenci, T, and KS Lee, "Channel architecture in maltoporin:dominance studies with lamB mutations influencing maltodextrin bindingprovide evidence for independent selectivity filters in each subunit", JBacteriol (1989) 171(2)855-61.

FERE89b: Ferenci, T, and S Stretton, "Cysteine-22 and cysteine-38 arenot essential for the function of maltoporin (LamB protein)", FEMSMicrobiol Lett (1989), 52(3)335-9.

FERR90: Ferrer-Lopez, P, P Renesto, M Schattner, S Bassot, P Laurent,and M Chignard, "Activation of human platelets by C5a-stimulatedneutrophils: a role for cathepsin G", American J Physiology (1990)258:C1100-C1107.

FIOR85: Fioretti, E, G Iacopino, M Angeletti, D Barra, F Bossa, and FAscoli, "Primary Structure and Antiproteolytic Activity of a Kunitz-typeInhibitor from Bovine Spleen", J Biol Chem (1985), 260:11451-11455.

FIOR88: Fioretti, E, M Angeletti, L Fiorucci, D Barra, F Bossa, and FAscoli, "Aprotinin-Like Isoinhibitors in Bovine Organs", Biol ChemHoppe-Seyler (1988), 369(Suppl)37-42.

FRAN87 Frankel, AD, JM Berg, and CO Pabo, "Metal-dependent folding of asingle zinc finger from transcription factor IIIA", Proc Natl Acad SciUSA (1987), 84:4841-45.

FRAN88: Frankel, A, and CO Pabo, "Fingering Too Many Proteins", Cell(1988), 53:675.

FRAN89: Franconi, GM, PD Graf, SC Lazarus, JA Nadel, GH Caughey, "MastCell Tryptase and Chymase Reverse Airway Smooth Muscle RelaxationInduced by Vasoactive Intestinal Peptide in the Ferret", J Pharmacol andExp Therap (1989), 248(3)947-51.

FREI90: Freimuth, PI, JW Taylor, and ET Kaiser, "Introduction of GuestPeptides into Escherichia coli Alkaline Phosphatase", J Biol Chemistry,(15 January 1990), 265(2)896-901.

FREU89: Freudl, R, H Schwarz, M Degen, and U Henning, "A lower sizelimit exists for export of fragments of an outer membrane protein (OmpA)of Escherichia coli K-12", J Mol Biol (1989), 205(4)771-5.

FRIT85: Fritz, H-J, "The Oligonucleotide-directed Construction ofMutations in Recombinant Filamentous Phage", DNA Cloning, Editor: DMGlover, IRL Press, Oxford, UK, 1985.

GARI84: Gariepy, J, P O'Hanley, SA Waldman, F Murad, and GK Schoolnik,"A common antigenic determinant found in two functionally unrelatedtoxins", J Exp Med, (1984), 160(4)1253-8.

GARI86: Gariepy, J, A Lane, F Frayman, D Wilbur, W Robien, G Schoolnik,and 0 Jardetzky, "Structure of the Toxic Domain of the Eshcerichia coliHeat-Stable Enterotoxin ST I", Biochem (1986), 25:7854-7866.

GARI87: Gariepy, J, AK Judd, and GK Schoolnik, "Importance of disulfidebridges in the structure and activity of Escherichia coli enterotoxinST1b", Proc Natl Acad Sci USA (1987), 84:8907-11.

GAUS87: Gauss, P, KB Krassa, DS McPheeters, MA Nelson, and L Gold,"Zinc(II) and the single-strnaded DNA binding protein of bacteriophageT4", Proc Natl Acad Sci USA (1987), 84:8515-19.

GEBH86: Gebhard, W, and K Hochstrasser, "Inter-α-trypsin inhibitor andits close relatives", in Barret and Salvesen (eds.) Protease Inhibitors(1986) Elsevier Science Publishers BV (Biomedical Division) pp.389-401.

GEBH90: Gebhard, W, K Hochstrasser, H Fritz, JJ Enghild, SV Pizzo, and GSalvesen, "Structure of the inter-α-inhibitor (inter-α-trypsininhibitor) and pre-o-inhibitor: current state and proposition of a newterminology", Biol Chem Hoppe-Seyler (1990), 371,suppl 13-22.

GEHR87: Gehring, K, A Charbit, E Brissaud, and M Hofnung, "Bacteriophagelambda receptor site on the Escherichia coli K-12 LamB protein", JBacteriol (1987), 169(5)2103-6.

GERD84: Gerday, C, M Herman, J Olivy, N Gerardin-Otthiers, D Art, EJacquemin, A Kaeckenbeeck, and J van Beeumen, "Isolation andcharacterization of the Heat Stable enterotoxin for a pathogenic bovinestrain of Escherichia coli"Vet Microbiol (1984), 9:399-414.

GETZ88: Getzoff, ED, HE Parge, DE McRee, and JA Tainer, "Understandingthe Structure and Antigenicity of Gonococcal Pili", Rev Infect Dis(1988), 10(Suppl 2)S296-299.

GIBS88: Gibson, TJ, JPM Postma, RS Brown, and P Argos, "A model for thetertiary structure of the 28 residue DNA-binding motif (`Zinc finger`)common to many eukaryotic transcriptional regulatory proteins", ProteinEngineering (1988), 2(3)209-218.

GIRA89: Girard, TJ, LA Warren, WF Novotny, KM Likert, SG Brown, JPMiletich, and GJ Broze Jr, "Functional significance of the Kunitz-typeinhibitory domains of lipoprotein-associated coagulation inhibitor",Nature (1989), 338:518-20.

GOLD83: Goldenberg, DP, and TE Creighton, "Circular and circularlypermuted forms of bovine pancreatic trypsin inhibitor.", J Mol Biol(1983), 165(2)407-13.

GOLD84: Goldenberg, DP, and TE Creighton, "Folding Pathway of a circularForm of Bovine Pancreatic Trypsin Inhibitor", J Mol Biol (1984),179:527-45.

GOLD85: Goldenberg, DP, "Dissecting the Roles of Individual Interactionsin Protein Stability: Lessons From a Circularized Protein", J CellularBiochem (1985), 29:321-335.

GOLD87: Gold, L, and G Stormo, "Translation Initiation", Volume 2,Chapter 78, p 1302-1307, Escherichia coli and Salmonella tvphimurium:Cellular and Molecular Biology, Neidhardt, FC, Editor-in-Chief, Amer Socfor Microbiology, Washington, DC, 1987.

GOLD88: Goldenberg, DP, "Kinetic Analysis of the Folding and Unfoldingof a Mutant Form of Bovine Pancreatic Trypsin Inhibitor Lacking theCysteine-14 and -38 Thiols", Biochem (1988), 27:2481-89.

GOTT87: Gottesman, S, "Regulation by Proteolysis", Volume 2, chapter 79,p 1308-1312. Escherichia coli and Salmonella typhimurium: Cellular andMolecular Biology, Neidhardt, FC, Editor in-Chief, Amer Soc forMicrobiology, Washington, DC, 1987.

GRAY81a: Gray, WR, A Luque, BM Olivera, J Barrett, and LJ Cruz, "PeptideToxins from Conus geographicus Venom", J Biol Chem (1981), 256:4734-40.

GRAY81b: Gray, CW, RS Brown, and DA Marvin, "Adsorption Complex ofFilamentous Virus", J Mol Biol (1981), 146:621-627.

GRAY83: Gray, WR, JE Rivier, R Galyean, LJ Cruz, and BM Olivera,"Conotoxin MI. Disulfide bonding and conformational states", J BiolChem, (1983), 258(20)12247-51.

GRAY84; Gray, WR, FA Luque, R Galyean, E Atherton, and RC Sheppard, BLStone, A Reyes, J Alford, M McIntosh, BM Olivera et al. "Conotoxin GI:disulfide bridges, synthesis, and preparation of iodinated derivatives",Biochemistry, (1984), 23(12)2796-802.

GRAY88: Gray, WR, and BM Olivera, "Peptide Toxins from Venomous ConusSnails", Ann Rev Biochem (1988), 57:665-700.

GREC79: Greco, WR, and MT Hakala, "Evaluation of Methods for Estimatingthe Dissociation Constant of Tight Binding Enzyme Inhibitors", J BiolChem (1979), 254:12104-109.

GREE53: Green, NM, and E Work, "Pancreatic Trypsin Inhibitor: 2.Reactions with Trypsin", Biochem J (1953), 54:347-52.

GUAR89: Cuarino, A, R Giannella, and MR Thompson, "Citrobacter freundiiProduces an 18-Amino-Acid HeatStable Enterotoxin Identical to the18-amino-acid Escherichiacoli Heat-Stable Enterotoxin (ST Ia)",Infection and Immunity (1989), 57(2)649-52.

GUDM89: Gudmundsdottir, A, PE Bell, MD Lundrigan, and C Bradbeer, and RJKadner, "Point mutations in a conserved region (TonB box) of Escherichiacoli outer membrane protein BtuB affect vitamin B12 transport", JBacteriol, (Dec 1989), 171(12)6526-33.

GUpT90: Gupta, SK, JL Niles, RT McCluskey, MA Arnaout, "Identity ofWegener's autoantigen (p29) with proteinase 3 and myeloblastin", Blood(Nov 15 1990), 76(10)2162.

GUSS88: Guss, JM, EA Merritt, RP Phizackerley, R Hedman, M Murata, KOHodgson, HC Freeman, "Phase Determination by Multiple-Wavelength X-rayDiffraction: Crystal Structure of a Basic "Blue" Copper Protein fromCucumbers", Science (1988), 241:806-11.

GUZM87: Guzman-Verduzco, L-M, and YM Kupersztoch, "Fusion of Escherichacoli Heat-Stable Enterotoxin and Heat-Labile Enterotoxin B Subunit", JBacteriol (1987), 169:5201-8.

GUZM89: Guzman-Verduzco, L-M, and YM Kupersztoch, "Rectification of TwoEscherichia coli Heat-Stable Enterotoxin Allel Sequences and Lack ofBiological Effect of Changing the Carboxy-Terminal Tyrosine toHistidine", Infection and Immunity (1989), 57(2)645-48.

GUZM90: Guzman-Verduzco, L-M, and YM Kupersztoch, "Export and processinganalysis of a fusion between the extracellular heat-stable enterotoxinand the periplasmic B subunti of the heat-labile enterotoxin inEscherichia coli", Molec Microbiol (1990), 4:253-64.

HALL82: Hall, MN, M Schwartz, and TJ Silhavy, "Sequence Informationwithin the lamB Gene is Required for Proper Routing of the Bacteriophageλ Receptor Protein to the Outer Membrane of Escherichia coli K-12", JMol Biol (1982), 156:93-112.

HANC87: Hancock, REW, "Role of Porins in Outer Membrane Permeability", JBacteriol (1987), 169:929-33.

HARD90: Hard, T, E Kellenbach, R Boelens, BA Maler, K Dahlman, LPFreedman, J Carlstedt-Duke, KR Yamamoto, J-A Gustafsson, and R Kaptein,"Solution Sturcture of the Glucocorticoid Receptor DNABinding Domain",Science (13 July 1990), 249:157-60.

HARK86: Harkki, A, TR Hirst, J Holmgren, and ET Palva, "Expression ofthe Escherichia coli lamB gene in Vibrio cholerae", Microb Pathog(1986), 1(3)283-8.

HARK87: Harkki, A, H Karkku, and ET Palva, "Use of lambda vehicles toisolate ompC-lacZ gene fusions in Salmonella typhimurium LT2", Mol GenGenet (1987), 209(3)607-11.

HASH85: Hashimoto, K, S Uchida, H Yoshida, Y Nishiuchi, S Sakakibara,and K Yukari, "Structure-activity relations of conotoxins at the

5 neuromuscular junction", Eur J Pharmacol (1985), 118(3)351-4.

HATA90: Hatanaka, Y, E Yoshida, H Nakayama, and Y Kanaoka, "Synthesis ofmu-conotoxin GIIIA: a chemical probe for sodium channels", Chem PharmBull (Tokyo), (Jan 1990), 38:236-8.

HECH90: Hecht, MH, JS Richardson, DC Richardson, and RC Ogden, "De NovoDesign, Expression, and Characterization of Felix: A Four-Helix BundleProtein of Native-Like Sequence", Science, (24 Aug 1990), 249:884-91.

HEDE89: Hedegaard, L, and P Klemm, "Type 1 fimbriae of Escherichia colias carriers of heterologous antigenic sequences", Gene, (Dec 21 1989),85(1)115-24.

HEIJ90: Heijne, G von, and C Manoil, "Review: Membrane proteins: fromsequence to structure", Protein Engineering (1990), 4(2)109-112.

HEIN87: Heine, HG, J Kyngdon, and T Ferenci, "Sequence determinants inthe lamB gene of Escherichia coli influencing the binding and poreselectivity of maltoporin.", Gene (1987), 53:287-92.

HEIN88: Heine, HG, G Francis, KS Lee, and T Ferenci, "Genetic analysisof sequences in maltoporin that contribute to binding domains and porestructure.", J Bacteriol (April 1988), 170:1730-8.

HEIT89: Heitz, A, L Chiche, D Le-Nguyen, and B Castro, "¹ H 2D NMR andDistance Geometry Study of the Folding of Ecballium elaterium TrypsinInhibitor, a Member of the Squash Inhibitor Family", Biochem (1989),28:2392-98.

HENR87: Henriksen, AZ, and JA Maeland, "The Porin Protein of the OuterMembrane of Escherichia coli: Reactivity in Immunoblotting,Antibody-binding by the Native Protein, and Cross-Reactivity with otherEnteric Bacteria", Acta path microbiol immunol scand, Sect B (1987),95:315-321.

HIDA90: Hidaka, Y, K Sato, H Nakamura, J Kobayashi, Y Ohizumi, and YSHimonishi, "Disulfide Pairings in geographutoxin I, a peptideneurotoxin from Conus geographus", FEBS Lett (1990), 264(1)29-32.

HILL89: Hillyard, DR, BM Olivera, S Woodward, GP Corpuz, WR Gray, CARamilo, LJ Cruz, "A Molluscivorus Conus Toxin: Conserved Framework inConotoxins", Biochem (1989), 28:358-61.

HINE80: Hines, JC, and DS Ray, "Construction and characterization of newcoliphage M13 cloning vectors.", Gene (1980), 11:(3-4)207-18.

HOCH84: Hoschstrasser, K, and E Wachter, "Elastase inhibitors, a processfor their preparation and medicaments containing these inhibitors", U.S.Pat. No. 4,485,100 (27 Nov 1984).

HOCJ85: Ho, C, M Jasin, and P Schimmel, "Amino acid replacements thatcompensate for a large polypeptide deletion in an enzyme", Science(1985), 229:389-93.

HOJI82: Hojima, Y, JV Pierce, and JJ Pisano, "Pumpkin Seed Inhibitor ofHuman Factor XII_(a) (activated Hageman Factor) and Bovine Trypsin",Biochem (1982), 21:3741-46.

HOLA89a: Holak, TA, D Gondol, J Otlewski, and T Wilusz, "Determinationof the Complete Three-Dimensional Structure of the Trypsin Inhibitorfrom Squash Seeds in Aqueous Solution by Nuclear Magnetic Resonance anda Combination of Distance Geometry and Dynamic Simulated Annealing", JMol Biol (1989), 210:635-648.

HOLA89b: Holak, TA, W Bode, R Huber, J Otlewski, and T Wilusz, "Nuclearmagnetic resonance solution and X-ray structures of squash trypsininhibitor exhibit the same conformation of the proteinase binding loop",J Mol Biol (Dec 5 1989), 210(3)649-54. ##STR728##

Int J Peptide Protein Res (1989), 34:346-51.

HOOP87: Hoopes, BC, and WR McClure, "Strategies in Regulation ofTranscription Initiation", Volume 2, Chapter 75, p 1231-1240,Escherichia coli and Salmonella typhimurium: Cellular and MolecularBiology, Neidhardt, FC, Editor-in-Chief, Amer Soc for Microbiology,Washington, DC, 1987.

HOUG84: Houghten, RA, JM Ostresh, and FA Klipstein, "Chemical synthesisof an octadecapeptide with the biological and immunological propertiesof human heat-stable Escherichia coli enterotoxin", Eur J Biochem(1984), 145:157-162.

HUBB86: Hubbard, RC, and RG Crystal, "Antiproteases and Antioxidants:Strategies for the Pharmacologic Prevention of Lung Destruction",Respiration (1986), 50(Suppl 1)56-73.

HUBB89: v Hubbard, RC, MA Casolaro, M Mitchell, SE Sellers, F Arabia, MAMatthay, and RG Crystal, "Fate of aerosolized recombinant DNA-producedα-1-antitrypsin: Use of the epithelial surface of the lower respiratorytract to administer proteins of therapeutic importance", Proc Natl AcadSci USA (1989), 86:680-4.

HUBE74: Huber, R, D Kukla, W Bode, P Schwager, K Bartels, J Deisenhofer,and W Steigemann, "Structure of the Complex formed by Bovine Trypsin andBovine Pancreatic Tryspin Inhibitor", J Mol Biol (1974), 89:73-101.

HUBE75: Huber, R, W Bode, D Kukla, and U Kohl, "The Structure of theComplex Formed by Bovine Trypsin and Bovine Pancreatic TrypsinInhibitor: III. Structure of the Anhydrotrypsin-Inhibitor Complex",Biophys Struct Mechan (1975), 1:189-201.

HUBE77 Huber, R, W Bode, D Kukla, U Kohl, CA Ryan, "The structure of thecomplex formed by bovine trypsin and bovine pancreatic trypsin inhibitorIII. Structure of the anhydro-trypsin-inhibitor complex.", BiophysStruct Mech (1975), 1(3)189-201.

HUTC87: Hutchinson, DCS, "The role of proteases and antiproteases inbronchial secretions", Eur J Respir Dis (1987), 71(Suppl.153)78-85.

HYNE90: Hynes, TR, M Randal, LA Kenedy, C Eigenbrot, and AA Kossiakoff,"X-ray crystal structure of the protease inhibitor domain of Alzheimer'samyloid beta-protein precursor", Biochemistry (1990), 29:10018-10022.

ILIC89: Il'ichev, AA, 00 Minenkova, SI Tat'kov, NN Karpyshev, AMEroshkin, VA Petrenko, and LS Sandakhchiev, "[Production of a viablevariant of the M13 phage with a foreign peptide inserted into the basiccoat protein]<Original> Poluchenie zhiznesposobnogo varianta faga M13 sovstroennym chuzherodnym peptidom v osnovnoi belok obolochki", Dokl AkadNauk SSSR, (1989), 307(2)481-3.

INOU82: Inouye, H, W Barnes, and J Beckwith, "Signal Sequence ofAlkaline Phosphatase of Escherichia coli", J Bacteriol (1982),149(2)434-439.

INOU86: Inouye, M, and R Sarma, Editors, Protein Enqineering:Applications in Science, Medicine. and Industry., Academic Press, NewYork, 1986.

ITOK79: Ito, K, G Mandel, and W Wickner, "Soluble precursor of anintegral membrane protein: Synthesis of procoat protein in Escherichiacoli infected with bacteriophage M13.", Proc Natl Acad Sci USA (1979),76:1199-1203.

JANA89: Janatova, J, KB.M Reid, and AC Willis, "Disulfide Bonds AreLocalized within the Short Consensus Repeat Units of ComplementRegulatory Proteins: C4b-Binding Protein", Biochem (1989), 28:4754-61.

JANI85: Janin, J, and C Chothia, "Domains in Proteins: Definitions,Location, and Structural Principles", Methods in Enzymology (1985),115(28)420-430.

JENN89: Jennings, PA, MM Bills, DO Irving, and JS Mattick, "Fimbriae ofBacteroides nodosus: protein engineering of the structural subunit forthe production of an exogenous peptide", Protein Eng, (Jan 1989),2(5)365-9.

JERI74a: Jering, H, and H Tschesche, "Replacement of Lysine by Arginine,Phenylalanine, and Tryptophan in the Reactive Site of theTrypsin-Kallikrein Inhibitor (Kunitz)", Angew Chem internat Edit (1974),13:662-3.

JERI76b: Jering, H, and H Tschesche, "Replacement of Lysine by Arginine,Phenylalanine, and Tryptophan in the Reactive Site of the BovineTrypsinKallekrein Inhibitor (Kunitz) and Change of the InhibitoryProperties", Eur J Biochem (1976), 61:453-63.

JOUB84: Joubert, FJ, "Trypsin Isoinhibitors from Momordica RepensSeeds", Phytochemistry (1984), 23:1401-6.

JUDD85: Judd, RC, "Structure and surface exposure of protein IIs ofNeisseria gonorrhoeae JS3", Infect Immun (1985), 48(2)452-7.

JUDD86: Judd, RC, "Evidence for N-terminal exposure of the protein IAsubclass of Neisseria gonorrhoeae protein I", Infect Immun (1986),54(2)408-14.

KABS84: Kabsch, W, and C Sander, "On the use of sequence homologies topredict protein structure: identical pentapeptides can have completelydifferent conformations", Proc Natl Acad Sci USA (1984), 81(4)1075-8.

KAIS87a: Kaiser, CA, D Preuss, P Grisafi, and D Botstein, "Many RandomSequences Functionally Replace the Secretion Signal Sequence of YeastInvertase", Science (1987), 235:312-7.

KAOR88: Kao, RC, NG Wehner, KM Skubitz, BH Gray, and JR Hoidal,"Proteinase 3, A Distinct Human Polymorphonuclear Leukocyte Proteinasethat Produces Emphysema in Hamsters", J Clin Invest (1988), 82:1963-73.

KAPL78: Kaplan, DA, L Greenfield, and G Wilcox, "Molecular Cloning ofSegments of the M13 Genome.", in The Single-Stranded DNA Phaoes,Denhardt, DT, D Dressler, and DS Ray editors, Cold Spring HarborLaboratory, 1978., p461-467.

KATZ86: Katz, BA, and A Kossiakoff, "The Crystallographically DeterminedStructures of Atypical Stained Disulfides Engineered into Subtilisin", JBiol Chem (1986), 261(33)15480-85.

KATZ90: Katz, B, and AA Kossiakoff, "Crystal Structures of SubtilisinBPN' Variants Containing Disulfide Bonds and Cavities: ConcertedStructural Rearrangements Induced by Mutagenesis", Proteins, Struct,Funct, and Genet (1990), 7:343-57.

KAUM86: Kaumerer, JF, JO Polazzi, and MP Kotick, "The mRNA for aproteinase inhibitor related to the HI-30 domain of inter-α-trypsininhibitor also encodes α₁ -microglobulin (protein HC)", Nucleic AcidsRes (1986), 14:7839-7850.

KID088: Kido, H, Y Yokogoshi, and N Katunuma, "Kunitz-type ProteaseInhibitor Found in Rat Mast Cells", J Biol Chem (1988), 263:18104-7.

KID090: Kido, H, A Fukutomi, J Schelling, Y Wang, B Cordell, and NKatunuma, "Protease-Specificity of Kunitz Inhibitor Domain ofAlzheimer's Disease Amyloid Protein Precursor", Biochem & Biophys ResComm (16 Mar 1990), 167(2)716-21.

KING86 King, TC, R Sirdeskmukh, and D Schlessinger, "Nucleolyticprocessing of ribonucleic acid transcripts in procaryotes", MicrobiolRev (1986), 50(4)428-51.

KISH85: Kishore, R, and P Balaram, "Stablization of gamma-TurnConformations in Peptides by Disulfide Bridges", Biopolymers (1985),24:2041-43.

KOBA89: Kobayashi, Y, T Ohkubo, Y Kyogoku, Y Nishiuchi, S Sakakibara, WBraun, and N Go, "Solution Conformation of Conotoxin GI Determined by IHNuclear Magnetic Resonance Spectroscopy and Distance GeometryCalculations", Biochemistry (1989), 28:4853-60.

KUB089: Kubota, H, Y Hidaka, H Ozaki, H Ito, T Hirayama, Y Takeda, and YShimonishi, "A Long-acting Heat-Stable Enterotoxin Analog ofEnterotoxigenic Esherichia coli with a Single D-Amino Acid.", BiochemBiophys Res Comm (1989), 161:229-235.

KUHN85a: Kuhn, A, and W Wickner, "Conserved Residues of the LeaderPeptide Are Essential for Cleavage by Leader Peptidase.", J Biol Chem(1985), 260:15914-15918.

KUHN85b: Kuhn, A, and W Wickner, "Isolation of Mutants in M13 CoatProtein That Affect Its Synthesis, Processing, and Assembly intoPhage.", J Biol Chem (1985), 260:15907-15913.

KUHN87: Kuhn, A, "Bacteriophage M13 Procoat Protein Inserts into thePlasma Membrane as a Loop Structure.", Science (1987), 238:1413-1415.

KUHN88: Kuhn, A, "Alterations in the extracellular domain of M13 procoatprotein make its membrane insertion dependent on secA and secY", Eur JBiochem (1988), 177(2)267-71.

KUKS89: Kuks, PFM, C Creminon, A-M Leseney, J Bourdais, A Morel, and PCohen, "Xenopus laevis Skin Arg-Xaa-Val-Arg-Gly-endoprotease", J BiolChem (1989), 264(25)14609-12.

KUOM90: Kuo, MD, SS Huang, and JS Huang, "Acidic fibroblast growthfactor receptor purified from bovine liver is a novel protein tyrosinekinase." J Biol Chem (1990), 265(27)16455-63.

KUPE90: Kupersztoch, YM, K Tachias, CR Moomaw, LA Dreyfus, R Urban, CSlaughter, and S Whipp, "Secretion of Methanol-Insoluble Heat-StableEnterotoxin (ST_(B)): Energy- and secA-Dependent Conversion ofPre-ST_(B) to an Intermediate Indistingurisable from the ExtracellularToxin", J Bacteriol (1990), 172(5)2427-32.

LAMB90: Lambert, P, H Kuroda, N Chino, TX Watanabe, T Kimura, and SSakakibara, "Solution Synthesis of Charybdotoxin (ChTX), A K⁺ ChannelBlocker", Biochem Biophys Res Comm (1990), 170(2)684-690.

LAND87: Landick, R, and C Yanofsky, "Transcription Attenuation", Volume2, Chapter 77, p 1276-1301, Escherichia coli and Salmonella typhimurium:Cellular and Molecular Biology, Neidhardt, FC, Editor-in-Chief, Amer Socfor Microbiology, Washington, DC, 1987.

LASK80: Laskowski, M, Jr, and I Kato, "Protein Inhibitors of Proteases",Ann Rev Biochem (1980), 49:593-626.

LAZU83: Lazure, C, NG Seidah, M Chretien, R Lallier, and S St-Pierre,"Primary structure determination of Escherichia coli heatstableenterotoxin of porcine origin", Canadian J Biochem Cell Biol (1983),61:287-92.

LEC087: Lecomte, JTJ, D Kaplan, M Llinas, E Thunberg, and G Samuelsson,"Proton Magnetic Resonance Characterization of Phoratoxins andHomologous Proteins Related to Crambin", Biochemistry (1987),26:1187-94.

LEEB71: Lee, B, and FM Richards, "The interpretation of proteinstructures: estimation of static accessibility.", J Mol Biol (1971),55:(3)379-400,

LEEC83: Lee, CH, SL Moseley, HW Moon, SC Whipp, CL Gyles, and M So,"Characterization of the Gene Encoding Heat-Stable Toxin II andPreliminary Molecular Epidemiological Studies of EnterotoxigenicEscherichia coli Heat-Stable Toxin II Producers", Infection and Immunity(1983), 42:264-268.

LEEC86: Lee, C, and J Beckwith, "Cotranssational and PosttranslationalProtein Translocation in Prokaryotic Systems.", Ann Rev Cell Biol(1986), 2:315-336.

LENG89b: Le-Nguyen, D, D Nalis, and B Castro, "Solid phase synthesis ofa trypsin inhibitor isolated from the Cucurbitaceae Ecballiumelaterium", Int J Peptide Protein Res (1989), 34:492-97.

LISS85: Liss, LR, BL Johnson, and DB Oliver, "Export defect adjacent tothe processing site of staphylococcal nuclease is suppressed by a orlAmutation", J Bacteriol (1985), 164(2)925-8.

LOPE85a: Lopez, J, and RE Webster, "Assembly site of bacteriophage flcorresponds to adhesion zones between the inner and outer membranes ofthe host cell", J Bacteriol (1985), 163(3)1270-4.

LOPE85b: Lopez, J, and RE Webster, "fipB and fioC: two bacterial locirequired for morphogenesis of the filamentous bacteriophage fl", JBacteriol (1985), 163(3)900-5.

LOSI86: Losick, R, P Youngman, and PJ Piggot, "Genetics of Endosporeformation in Bacillus subtilis", Ann Rev Genet (1986), 20:625-669.

LUGT83: Lugtenberg, B, and L van Alphen, "Molecular Architecture andFunction of the Outer Membrane of Escherichia coli and otherGram-Negative Bacteria", Biochim Biophys Acta (1983), 737:51-115.

LUIT83: Luiten, RGM, JGG Schoenmakers, and RNH Konings, "The major coatprotein gene of the filamentous Pseudomonas aeruoinosa phage Pf3:absence of an N-terminal leader signal sequence", Nucleic Acids Research(1983), 11(22)8073-85.

LUIT85: Luiten, RGM, DG Putterman, JGG Schoenmakers, RNH Konings, and LADay, "Nucleotide Sequence of the Genome of Pf3, an IncP-1Plasmid-Specific Filamentous Bacteriophage of Pseudomonas aeruginosa", JVirology, (1985), 56(1)268-276.

LUIT87: Luiten, RGM, RIL Eggen, JGG Schoenmakers, and RNH Konings,"Spontaneous Deletion Mutants of Bacteriophage Pf3: Mapping of SignalsInvolved in Replication and Assembly", DNA (1987), 6(2)129-37.

LUND86: Lundeen, M, "Preferences of the Side Chains in Proteins forHelix, Beta Strand, Turn, and Other Conformations. Secondary Structuresof Copper Proteins", J Inorgan Biochem (1986), 27:151-62.

MACH89: Machleidt, W, U Thiele, B Laber, I Assfalg-Machleidt, A Esterl,G Wiegand, J Kos, V Turk, and W Bode, "Mechanism of inhibition of papainby chicken egg white cystatin", FEBS Lett (1989), 243(2)234-8.

MAC188; MacIntyre, S, R Freudl, ML Eschbach, and U Henning, "Anartificial hydrophobic sequence functions as either an anchor or asignal sequence at only one of two positions within the Escherichia coliouter membrane protein OmpA", J Biol Chem (1988), 263(35)19053-9.

MAK080: Makowski, L, DLD Caspar, and DA Marvin, "FilamentousBacteriophage Pfl Structure Determined at 7 A Resolution by Refinementof Models for the alpha-Helical Subunit.", J Mol Biol (1980),140:149-181.

MALA64: Malamay, MH, and BL Horecker, "Release of alkaline phosphotasefrom cells of E. coli upon lysozyme spheroplast formation", Biochem(1964), 3:1889-1893.

MANI82: Maniatis, T, EF Fritsch, and J Sambrook, Molecular Cloning, ColdSpring Harbor Laboratory, 1982.

MAN086: Manoil, C, and J Beckwith, "A Genetic Approach to AnalyzingMembrane Protein Topology", Science (1986), 233:1403-1408.

MANO88: Manoil, C, D Boyd, and J Beckwith, "Molecular genetic analysisof membrane protein topology", Topics in Genetics (1988), 4(8)223-6.

MARK86: Marks, CB, M Vasser, P Ng, W Henzel, and S Anderson, "Productionof native, correctly folded bovine pancreatic trypsin inhibitor inEscherichia coli", J Biol Chem (1986), 261:7115-7118.

MARK87: Marks, CB, H Naderi, PA Kosen, ID Kuntz, and S Anderson,"Mutants of Bovine Pancreatic Trypsin Inhibitor Lacking Cysteines 14 and38 Can Fold Properly", Science (1987), 235:1370-1373.

MARQ83: Marquart, M, J Walter, J Deisinhoffer, W Bode, and R Huber, "Thegeometry of the reactive site and of the peptide groups in trypsin,trypsinogen, and its complexes with inhibitors", Acta Cryst, B (1983),39:480ff.

MARV75: Marvin, DA and EJ Wachtel, "Structure and assembly offilamentous bacterial viruses", Nature (1975), 253:19-23.

MARV78: Marvin, DA, "Structure of the Filamentous Phage Virion.", in TheSingle-Stranded DNA Phages, Denhardt, DT, D Dressler, and DS Rayeditors, Cold Spring Harbor Laboratory, 1978., p583-603.

MARV80: Marvin, D, and L Makowski, "Helical Viruses", Progr Clin BiolRes (1980), 40:347-48.

MASS90: Massefski, W, Jr, AG Redfield, DR Hare, and C Miller, "MolecularStructure of Charybdotoxin, a Pore-Directed Inhibitor of Potassium IonChannels", Science (3 Aug 1990), 249:521-524.

MATS89: Matsumura, M, WJ Becktel, M Levitt, and BW Matthews,"Stabilization of phage T4 lysozyme by engineered disulfide bonds", ProcNatl Acad Sci USA (1989), 86:6562-6.

MCCA90: McCafferty, J, AD Griffiths, G Winter, and DJ Chiswell, "Phageantibodies: filamintous phage displaying antibody variable domains",Nature, (6 Dec 1990), 348:552-4.

MCKE85: McKern, NM, IJ O'Donnell, DJ Stewart, and BL Clark, "Primarystructure of pilin protein from Bacteroides nodosus strain 216:comparison with the corresponding protein from strain 198", J GenMicrobiol (1985), 131(Pt 1)1-6.

MCPH85: McPhalen, CA, HP Schnebli, and MNG James, "Crystal and molecularstructure of the inhibitor eglin from leeches in complex with subtilisinCarlsberg", FEBS Lett (1985), 188(1)55-8.

MCWH89: McWherter, CA, WF Walkenhorst, EJ Campbell, and GI Glover,"Novel Inhibitors of Human Leukocyte Elastase and Cathepsin G. SequenceVariants of Squash Seed Protease Inhibitor with Altered ProteaseSelectivity", Biochemistry (1989), 28:5708-14.

MEDV89: Medved, LV, TF Busby, and KC Ingham, "Calorimetric Investigationof the Domain Structure of Human Complement Cls. Reversible Unfolding ofthe Short Consensus Repeat Units", Biochem (1989), 28:5408-14.

MESS77: Messing, J, B Gronenborn, B Muller-Hill, and PH Hofschneider,"Filamentous coliphage M13 as a cloning vehicle: insertion of a HindIIfragment of the lac regulatory region in M13 replicative form invitro.", Proc Natl Acad Sci USA (1977), 74:3642-6.

MESS78: Messing, J, and B Gronenborn, "The Filamentous Phage M13 as aCarrier DNA for Operon Fusions In Vitro.", in The Single-Stranded DNAPhages, Denhardt, DT, D Dressler, and DS Ray editors, Cold Spring HarborLaboratory, 1978.,p449-453.

MILL87a: Miller, S, J Janin, AM Lesk, and C Chothia, "Interior andSurface Monomeric Proteins", J Mol Biol (1987), 196:641-656.

MILL87b: Miller, ES, J Karam, M Dawson, M Trojanowska, P Gauss, and LGold, "Translational repression: biological activity of plasmid-encodedbacteriophage T4 RegA protein.", J Mol Biol (1987), 194:397-410.

MISR88a: Misra, R, and SA Benson, "Genetic identification of the poredomain of the OmpC porin of Escherichia coli K-12", J Bacteriol (1988),170(8)3611-7.

MISR88b: Misra, R, and SA Benson, "Isolation and Characterization ofOmpC Porin Mutants with Altered Pore Properties", J Bacteriol (1988),170:528-33.

MOLL89: Molla, A, A Charbit, A L.e Guern, A Ryter, and M Hofnung,"Antibodies against synthetic peptides and the topology of LamB, anouter membrane protein from Escherichia coli K12", Biochem (1989),28(20)8234-41.

MORS87: Morse, SA, TA Mietzner, G Bolen, A Le Faou, and G Schoolnik,"Characterization of the major iron-regulated protein of Neisseriagonorrhoeae and Neisseria meningitidis", Antonie Van Leeuwenhoek (1987),53(6)465-9.

MORS88: Morse, SA, C-Y Chen, A LeFaou, and TA Meitzner, "A PotentialRole for the Major Iron-Regulated Protein Expressed by PathogenicNeisseria Species", Rev Infect Dis (1988), 10(Suppl 2)S306-10.

MOSE82: Moses, PB, and K Horiuchi, "Effects of Transposition andDelection upon Coat Protein Gene Expression in Bacteriophage fl",Virology (1982), 119:231-244.

MOSE83: Moser, R, RM Thomas, and B Gutte, "An Artificial CrystallineDDT-binding polypeptide", FEBS Letters (1983), 157:247-251.

MOSE85: Moser, R, S Klauser, T Leist, H Langen, T Epprecht, and B Gutte,"Applications of Synthetic Peptides", Angew Chemie, Int Edition English(1985), 24(9)719-27.

MOSE87: Moser, R, S Frey, K Muenger, T Hehlgans, S Klauser, H Langen,E-L Winnacker, R Mertz, and B Gutte, "Expression of the synthetic geneof an artificial DDTbinding polypeptide in Escherichia coli", ProteinEngineering (1987), 1:339-343.

NADE87: Nadel, JA, and B Borson, "Secretion and ion transport in airwaysduring inflammation", Biorheology (1987), 24:541-549.

NADE90: Nadel, JA, "Neutrophil Proteases and Mucus Secretion", 1990Cystic Fibrosis Meeting, Arlington, Va., p156.

NAKA81: Nakashima, Y, B Frangione, RL Wiseman, WH Konigsberg, "PrimaryStructure of the Major Coat Protein of the Filamentous BacterialViruses, Ifl and Ike", J Biol Chem (1981), 256(11)5792-7.

NAKA86a: Nakae, T, J Ishii, and T Ferenci, "The Role of theMaltodextrin-binding Site in Determining the Transport Properties of theLamB Protein", J Biol Chem (1986), 261:622-26.

NAKA86b: Nakae, T, "Outer-Membrane Permeability of Bacteria", CRC CritRev Microbiol (1986), 13:1-62.

NAKA87: Nakamura, T, T Hirai, F Tokunaga, S Kawabata, and S Iwanaga,"Purification and Amino Acid Sequence of Kunitz-type Protease InhibitorFound in the Hemocytes of Horseshoe Crab (Tachypleus tridentatus)", JBiochem (1987), 101:1297-1306.

NICH88: Nicholson, H, WJ Becktel, and BW MAtthews, "Enhanced proteinthermostability from desgined mutations that interact with α-helixdipoles", Nature (1988), 336:651-56.

NIKA84: Nikaido, H, and HCP Wu, "Amino acid sequence homology among themajor outer membrane proteins of Escherichia coli", Proc Natl Acad SciUSA (1984), 81:1048-52.

NlLE89: Niles, JL, RT McCluskey, MF Ahmad, and MA Arnaout, "Wgener'sGranulomatosis Autoantigen Is a Novel Neutrophil Serine Proteinase",Blood (1989), 74(6)1888-93.

NISH82: Nishiuchi, Y, and S Sakakibara, "Primary and secondary structureof conotoxin GI, a neurotoxic tridecapeptide from a marine snail", FEBSLett (1982), 148:260-2.

NlSH86: Nishiuchi, Y, K Kumagaye, Y Noda, TX Watanabe, and S Sakakibara,"Synthesis and secondary-structure determination of omega-conotoxinGVIA: a 27-peptide with three intramolecular disulfide bonds",Biopolymers, (1986), 25:S61-8.

NORR89a: Norris, K, and LC Petersen, "Aprotinin analogues and processfor the production thereof", European Patent Application 0 339 942 A2.

NORR89b: Norris, K, F Norris, S BJorn, "Aprotinin Homologues and Processfor the Production of Aprotinin and aprotinin homologues in Yeast", PCTpatent application W089/01968.

OAST88: Oas, TG, and PS Kim, "A peptide model of a protein foldingintermediate", Nature (1988), 336:42-48.

ODOM90: Odom, L, "Inter-α-trypsin inhibitor: a plasma proteinaseinhibitor with a unique chemical structure", Int J Biochem (1990),22:925-930.

OHKA81: Ohkawa, I, and RE Webster, "The Orientation of the Major CoatProtein of Bacteriophage fl in the Cytoplasmic Membrane of Esherichiacoli.", J Biol Chem (1981), 256:9951-9958.

OKAM87: Okamoto, K, K Okamoto, J Yukitake, Y Kawamoto, and A Miyama,"Substitutions of Cysteine Residues of Escherichia coli Heat-StableEnterotoxin by Oligonucleotide-Directed Mutagenesis", Infection andImmunity (1987), 55:2121-2125.

OKAM88: Okamoto, K, K Okamoto, J Yukitake, and A Miyama, "Reduction ofEnterotoxic Activity of Escherichia coli Heat-Stable Enterotoxin bySubstitution for an Aspartate Residue", Infection and Immunity (1988),56:2144-8.

OKAM90: Okamoto, K, and M Takahara, "Synthesis of Escherichia coliHeat-Stable Enterotoxin STp as a Pre-Pro Form and Role of the ProSequence in Secretion", J Bacteriol (1990), 172(9)5260-65.

OLIP86: Oliphant, AR, AL Nussbaum, and K Struhl, "Cloning ofrandom-sequence oligodeoxynucleotides", Gene (1986), 44:177-183.

OLIP87: Oliphant, AR, and K Struhl "The Use of Random-SequenceOligonucleotides for Determining Consensus Sequences", in Methods inEnzymology 155 (1987)568-582. Editor Wu, R; Academic Press, New York.

OLIV85a: Oliver, D, "Protein Secretion in Escherichia coli.", Ann RevMicrobiol (1985), 39:615-648.

OLIV85b: Olivera, BM, WR Gray, R Zeikus, JM McIntosh, J Varga, J Rivier,V de Santos, and LJ Cruz, "Peptide Neurotoxins from Fish Hunting ConeSnails", Science (1985), 230:1338-43.

OLIV87b: Olivera, BM, LJ Cruz, V de Santos, GW LeCheminant, D Griffin, RZeikus, JM McIntosh, R Galyean, J Varga, WR Gray, et al. "Neuronalcalcium channel antagonists. Discrimination between calcium channelsubtypes using omega-conotoxin from Conus magus venom", Biochemistry,(1987), 26(8)2086-90.

OLlV90a: Olivera, BM, J Rivier, C Clark, CA Ramilo, GP Corpuz, FCAbogadie, EE Mena, SR Woodward, DR Hillyard, LJ Cruz, "Diversity ofConus Neuropeptides", Science, (20 July 1990), 249:257-263.

OLIV90b: Olivera, BM, DR Hillyard, J Rivier, S Woodward, WR Gray, GCorpuz, LJ Cruz, "Conotoxins: Targeted Peptide Ligands from SnailVenoms", Chapter 20 in Marine Topxins, American Chemical Society, 1990.

OLTE89: Oltersdorf, T, LC Fritz, DB Schenk, I Lieberburg, KLJohnson-Wood, EC Beattie, PJ Ward, RW Blacher, HF Dovey, and S Sinha,"The Secreted form of the Alzheimer's amyloid precursor protein with theKunitz domain is protease nexin-II", Nature (1989), 341:144-7.

ORND85: Orndorff, PE, and S Falkow, "Nucleotide Sequence of pilA, theGene Encoding the Structural Component of Type 1 Pili in Escherichiacoli", J Bacteriol (1985), 162:454-7.

OTLE85: Otlewski, J, and T Wilusz, "The Serine Proteinase Inhibitor fromSummer Squash (Cucurbita pepo): Some Structural Features, Stability andProteolytic Degradation", Acta Biochim Polonica (1985), 32(4)285-93.

OTLE87: Otlewski, J, H Whatley, A Polanowski, and T Wilusz, "Amino-AcidSequences of Trypsin Inhibitors from Watermelon (Citrullus vulgaris) andRed Bryony (Bryonia dioica) Seeds", Biol Chem Hoppe-Seyler (1987),368:1505-7.

PAB079: Pabo, CO, RT Sauer, JM Sturtevant, and M Ptashne, "The LambdaRepressor Contains Two Domains.", Proc Natl Acad Sci USA (1979),76:1608-1612.

PAB086: Pabo, CO, and EG Suchanek, "Computer-Aided Model BuildingStrategies for Protein Design", Biochem (1986), 25:5987-91.

PAGE88: Pages, JM, and JM Bolla, "Assembly of the OmpF porin ofEscherichia coli B. Immunological and kinetic studies of the integrationpathway", Eur J Biochem (1988), 176(3)655-60.

PAGE90: Pages, JM, JM Bolla, A Bernadac, and D Fourel, "Immunologicalapproach of assembly and topology of OmpF, an outer membrane protein ofEscherichia coli", Biochimie (1990), 72:169-76.

PAKU86: Pakula, AA, VB Young, and RT Sauer, "Bacteriophage λ cromutations: Effects on activity and intracellular degradation.", ProcNatl Acad Sci USA (1986), 83:8829-8833.

PANT87: Pantoliano, MW, RC Ladner, PN Bryan, ML Rollence, JF Wood, andTL Poulos, "Protein Engineering of Subtilisin BPN': EnhancedStabilization through the Introduction of Two Cysteines To Form aDisulfide Bond", Biochem (1987), 26:2077-82.

PANT90: Pantoliano, MW, and RC Ladner, "Computer Designed StabilizedProteins and Method for Producing Same", U.S. Pat. No. 4,908,773, Mar.13, 1990.

PAOL86: Paoletti, E, and D Panicali, "Modified Vaccinia Virus", U.S.Pat. No. 4,603,112, Jul. 29, 1986.

PAPA82: Papamokos, E, E Weber, W Bode, R Huber, M Empie, I Kato, and MLaskowski Jr, "Crystallographic Refinement of Japanese Quail Ovomucoid,a Kazal-type Inhibitor, and Model Building Studies of Complexes withSerine Proteases", J Mol Biol (1982), 158:515-537.

PARD89: Pardi, A, A Galdes, J Florance, and D Maniconte, "SolutionStructres of α-Conotoxin Gl Determined by Two-Dimensional NMRSpectroscopy", Biochemistry (1989), 28:5494-5501.

PARG87 Parge, HE, DE McRee, MA Capozza, SL Bernstein, ED Getzoff, and JATainer, "Three dimensional structure of bacterial pili", Antonie VanLeeuwenhoek (1987), 53(6)447-53.

PARM88: Parmley, SF, and GP Smith, "Antibody-selectable filamentous fdphage vectors: affinity purification of target genes", Gene (1988),73:305-318.

PARR88: Parraga, G, SJ Horvath, A Eisen, WE Taylor, L Hood, ET Young, REKlevit, "Zinc-Dependent Structures of a Single-Finger Domain of YeastADRl", Science (1988), 241:1489-92.

PEAS88: Pease, JHB, and DE Wemmer, Biochem (1988), 27:8491-99.

PEAS90: Pease, JHB, RW Storrs, and DE Wemmer, "Folding and activity ofhybrid sequence, disuylfide-stabilized peptides", Proc Natl Acad Sci USA(1990), 87:5643-47.

PEET85: Peeters, BPH, RM Peters, JGG Schoenmakers, and RNH Konings,"Nucleotide Sequence and Genetic Organization of the Genome of theN-Specific Filamentous Bacteriophage Ike: Comparison with the Genome ofthe F-Specific Filamentous Phages M13, fd, and fl", J Mol Biol (1985),181:27-39.

PEET87: Peeters, BPH, JGG Schoenmakers, and RNH Konings, "Comparison ofthe DNA Sequences Involved in Replication and Packaging of theFilamentous Phages IKe, and Ff (M13, fd, and fl)", DNA (1987),6(2)139-147.

PERR84: Perry, LJ, and R Wetzel, "Disulfide Bond Engineered into T4Lysozyme: Stablilation of the Protein Toward Thermal Inactivation",Science (1984), 226:555-7.

PERR86: Perry, LJ, and R Wetzel, "Unpaired Cysteine-54 Interferes withthe Ability of an Engineered Disulfide To Stabilize T4 Lysozyme",Biochem (1986), 25:733-39.

PETE89: Peterson, MW, "Neutrophil cathepsin G increases transendothelialalbumin flux", J Lab Clin Med (1989), 113(3)297-308.

PONT88: Ponte, P, P Gonzalez-DeWhitt, J Schilling, J Miller, D Hsu, BGreenberg, K Davis, W Wallace, I Liederburg, F Fuller, and B Cordell, "Anew A4 amyloid mRNA contains a domain homologous to serine proteinaseinhibitors", Nature (1988), 331:525-7.

POTE83: Poteete, AR, "Domain Structure and Quaternary Organization ofthe Bacteriophage P22 Erf Protein.", J Mol Biol (1983), 171:401-418.

QUI087: Quiocho, FA, NK Vyas, JS Sack and MA Storey, "PeriplasmicBinding Proteins: Structure and New Understanding of Protein-LigandInteractions.", in Crystalloqraphy in Molecular Biology, Moras, D. etal.. editors, Plenum Press, 1987.

RAND87: Randall, LL, SJS Hardy, and JR Thom, "Export of Protein: ABiochemical View", Ann Rev Microbiol (1987), 41:507-41.

RASC86: Rasched, I, and E Oberer, "Ff Coliphages: Structural andFunctional Relationships", Microbiol Rev (1986) 50:401-427.

RASH84: Rashin, A, "Prediction of Stabilities of Thermolysin Fragments",Biochemistry (1984), 23:5518.

RAYC87: Ray, C, KM Tatti, CH Jones, and CP Moran Jr, "Genetic Analysisof RNA Polymerase-Promoter Interaction during Sporulation in Bacillussubtilis", J Baceriol (1987), 169(5)1807-1811.

REID88a: Reidhaar-Olson, JF, and RT Sauer, "Combinatorial CassetteMutagenesis as a Probe of the Information Content of Protein Sequences",Science (1988), 241:53-57.

REID88b: Reid, J, H Fung, K Gehring, PE Klebba, and H Nikaido,"Targeting of porin to the outer membrane of Escherichia coli. Rate oftrimer assembly and identification of a dimer intermediate", J Biol Chem(1988), 263(16)7753-9.

REST88: Rest, RF, "Human Neutrophil and Mast Cell Proteases Implicatedin Inflammation", Meth Enzymol (1988), 163:309-27.

RICH81: Richardson, JS, "The Anatomy and Taxonomy of Protein Structure",Adv Protein Chemistry (1981), 34:167-339.

RICH86: Richards, JH, "Cassette mutagenesis shows its strength.", Nature(1986), 323:187.

RIT083: Ritonja, A, B Meloun, and F Gubensek, "The Primary Structure ofVipera ammodytes venom chymotrypsin inhibitor", Biochim Biophys Acta(1983), 746:138-145.

RIVI87b: Rivier, J, R Galyean, WR Gray, A Azimi-Zonooz, JM McIntosh, LJCruz, and BM Olivera, "Neuronal calcium channel inhibitors. Synthesis ofomega-conotoxin GVIA and effects on 45Ca uptake by synaptosomes", J BiolChem, (1987), 262(3)1194-8.

ROBE86: Roberts, S, and AR Rees "The cloning and expression of ananti-peptide antibody: a system for rapid analysis of the bindingproperties of engineered antibodies.", Protein Engineering (1986),1:59-65.

RONC90: Ronco, J, A Charbit, and M Hofnung, "Creation of targets forproteolytic cleavage in the LamB protein of E coli K12 by geneticinsertion of foreign sequences: implications for topological studies",Biochimie (1990), 72(2-3)183-9.

ROSE85: Rose, GD, "Automatic Recognition of Domains in GlobularProteins", Methods in Enzymololgy (1985), 115(29)430-440.

ROSS81: Rossman, M, and P Argos, "Protein Folding.", Ann Rev Biochem(1981), 50:497-532.

RUEH73: Ruehlmann, A, D Kukla, P Schwager, K Bartels, and R Huber,"Structure of the Complex formed by Bovine Trypsin and Bovine PancreaticTrypsin Inhibitor: Crystal Structure Determination and Stereochemistryof the Contact Region", J Mol Biol (1973), 77:417-436.

RUSS81: Russel, M, and P Model, "A mutation dowanstream from the signalpeptidase cleavage site affects cleavage but not membrane insertion ofphage coat protein.", Proc Natl Acad Sci USA (1981), 78:1717-1721.

SALI64: Salivar, WO, H Tzagoloff, and D Pratt, "Some physical, chemical,and biological properties of the rod-shaped coliphage M13", Virology(1964), 24:359-71.

SALI87: Salier, JP, M Diarra-Mehrpour, R Sesboue, J Bourguignon, RBenarous, I Ohkubo, S Kurachi, K Kurachi, and JP Martin, "Isolation andcharacterization of cDNAs encoding the heavy chain of humaninter-alphy-trypsin inhibitor (IaTI): Unambiguous evidence formultipolypeptide chain sturcture of IaTI", Proc Nat Acad Sci USA (1987),84:8271-8276.

SALI88: Sali, D, M Bycroft, and AR Fersht, "Stabilization of proteinstructure by interaction of β-helix dipole with a charged side chain",Nature (1988), 335:740-3.

SALI90: Salier, J-P, "Inter-α-trypsin inhibitor: emergence of a familywithin the Kunitz-type protease inhibitor superfamily", TIBS (1990),15:435-439.

SALV87: Salvesen, G, D Farley, J Shuman, A Przybyla, C Reilly, and JTravis, "Molecular Cloning of Human Cathepsin G: Structural Similarityto Mast Cell and Cytotoxic T Lymphocyte Proteinases", Biochem (1987),26:2289-93.

SAMB89: Sambrook, J, EF Fritsch, and T Maniatis, Molecular Cloning, ALaboratory Manual, Second Edition, Cold Spring Harbor Laboratory, 1989.

SASA84: Sasaki, T, "Amino Acid Sequence of a Novel Kunitz-typechymotrypsin inhibitor from hemolymph of silkworm larvae, Bombyx mori",FEBS Lett (1984), 168:227-230.

SAUE86: Sauer, RT, K Hehir, RS Stearman, MA Weiss, A Jeitler-Nilsson, EGSuchanek, and CO Pabo, "An Engineered Intersubunit Disulfide Enhancesthe Stability and DNA Binding of the N-terminal Domain of λ Repressor",Biochem (1986), 25:5992-98.

SCHA78: Schaller, H, E Beck, and MTakanami, "Sequence and RegulatorySignals of the Filamentous Phage Genome.", in The Single-Stranded DNAPhages, Denhardt, D.T., D. Dressler, and D.S. Ray editors, Cold SpringHarbor Laboratory, 1978., p139-163.

SCHN86: Schnabel, E, W Schroeder, and G Reinhardt, "[Ala₂ ¹⁴,38]Aprotinin: Preparation by Partial Desulphurization of Aprotinin byMeans of Raney Nickel and Comparison with other Aprotinin Derivatives",Biol Chem Hoppe-Seyler (1986), 367:1167-76.

SCHN88a: Schnabel, E, G Reinhardt, W Scroeder, H Tschesche, HR Wenzel,and A Mehlich, "Enzymatic Resynthesis of the `Reactive Site` Bond in theModified Aprotinin Derivatives [Seco-15/16]Aprotinin and[Di-seco-15/16,39/40]Aprotinin", Biol Chem Hoppe-Seyler (1988),369:461-8.

SCHU79: Schulz, GE, and RH Schirmer, Principles of Protein Structure,Springer-Verlag, New York, 1979.

SCHW87: Schwarz, H, HJ Hinz, A Mehlich, H Tschesche, and HR Wenzel,"Stability studies on derivatives of the bovine pancreatic trypsininhibitor.", Biochemistry (1987), 26:(12)p3544-51.

SCOT87a: Scott, MJ, CS Huckaby, I Kato, WJ Kohr, M Laskowski Jr., M-JTsai and BW O'Malley, "Ovoinhibitor Introns Specify Functional Domainsas in the Related and Linked Ovomucoid Gene", J Biol Chem (1987),262(12)5899-5907.

SCOT87b: Scott, CF, HR Wenzel, HR Tschesche, and RW Colman, "Kinetics ofInhibition of Human Plasma Kallikrein by a SiteSpecific ModifiedInhibitor Arg¹⁵ -Aprotinin: Evaluation Using a Microplate System andComparison With Other Proteases", Blood (1987), 69:1431-6.

SCOT90: Scott, JK, and GP Smith, "Searching for Peptide Ligands with anEpitope Library", Science, (27 July 1990), 249:386-390.

SEKI85: Sekizaki, T, H Akaski, and N Terakado, "Nucleotide sequences ofthe genes for Escherichia coli heat-stable enterotoxin I of bovine,avian, and porcine origins", Am J Vet Res (1985), 46:909-12.

SELL87: Selloum, L, M Davril, C Mizon, M Balduyck, and J Mizon, "Theeffect of the glycosaminoglycan chain removal on some properties of thehuman urinary trypsin inhibitor", Biol Chem Hoppe-Seyler (1987),368:47-55.

SERW87: Serwer, P, "Review: Agarose Gel Electrophoresis ofBacteriophages and Related Particles", J Chromatography (1987),418:345-357.

SHIM87: Shimonishi, Y, Y Hidaka, M Koizumi, M Hane, S Aimoto, T Takeda,T Miwatani, and Y Takeda, "Mode of disulfide bond formation of aheat-stable enterotoxin (ST_(h)) produced by a human strain ofenterotoxigenic Escherichia coli", FEBS Lett (1987), 215:165-170.

SHOR81: Shortle, D, D Koshland, GM Weinstock, and D Botstein,"Segment-directed mutagenesis: Construction in vitro of point mutationslimited to a small predetermined region of a circular DNA molecule",Proc Natl Acad Sci USA (1980), 77:5375-79.

SHOR85:

Shortle, D, and B Lin, "Genetic Analysis of Staphylococcal Nuclease:Identification of Three Intragenic `Global` Suppressors ofNuclease-Minus Mutations.", Genetics (1985), 110:539-555.

SIEK87: Siekmann, J, HR Wenzel, W Schroeder, H Schutt, E Truscheit, AArens, E Rauenbusch, WH CHazin, K Wutrich, and H Tschesche,"Pyroglutamul-aprotinin, a new aprotinin homologue from bovinelungs-isolation,lproperties, sequence analysis and characterizationusing H nuclear magnetic resonance in solution", Biol Chem Hoppe-Seyler(1987), 368:1589-96.

SlEK88: Siekmann, J, HR Wenzel, W Schroeder, and H Tschesche,"Characterization and Sequence Determination of Six Aprotinin homologuesfrom bovine lungs", Biol Chem Hoppe-Seyler (1988), 369:157-163.

SIEK89: Siekmann, J, J Beckmann, A Mehlich, HR Wenzel, H Tschesche, ESchnabel, W Mueller-Esterl, "Immunological Characterization of Naturaland Semisynthetic Aprotinin Variants", Biol Chem Hoppe-Seyler (1989),370:677-81.

SILH77: Silhavy, TJ, HA Shuman, J Beckwith, and M Schwartz, "Use of genefusions to study outer membrane protein localization in Escherichiacoli", Proc Natl Acad Sci USA (1977), 74(12)5411-5415.

SILH85: Silhavy, TJ, and JR Beckwith, "Uses of lac Fusions for the Studyof Biological Problems", Microbiol Rev (1985), 49(4)398-418.

SlNH90: Sinha, S, HF Dovey, P Seubert, PJ Ward, RW Blacher, M Blaber, RABradshaw, M Arici, WC Mobley, and I Lieberburg, "The Protease InhibitoryProperties of the Alzheimer's beta-amyloid Precursor Protein", J BiolChem (1990), 265(16)8983-5.

SMIT85: Smith GP, "Filamentous Fusion Phage: Novel Expression VectorsThat Display Cloned Antigens on the Virion Surface", Science (1985),228:1315-1317.

SMIT88a: Smith, GP, "Filamentous Phage Assembly: MorphogeneticallyDefective Mutants That Do Not Kill the Host", Virology (1988),167:156-165.

SMIT88b: Smith, GP, "Filamentous Phages as Cloning Vectors", Chapter 3in Vectors: A Survey of Molecular Cloning Vectors and Their Uses,Editors: RL Rodriguez and DT Denhardt, Butterworth, Boston, 1988.

SODE85: Sodergren, EJ, J Davidson, RK Taylor, and TJ Silhavy, "Selectionfor Mutants Altered in the Expression or Export of Outer Membrane PorinOmpF", J Bacteriol (1985), 162(3)1047-1053.

SOME85: So, M, E Billyard, C Deal, E Getzoff, P Hagblom, TF Meyer, ESegal, and J Tainer, "Gonococcal Pilus: Genetics and Structure", CurrTop in Microbiol & Immunol (1985), 118:13-28.

SOMM89: Sommerhoff, CP, GH Caughey, WE Finkbeiner, SC Lazarus, CBBasbaum, and JA Nadel, "A Potent Secretagogue for Airway Gland SerousCells", J Immunol (1989), 142:2450-56.

SOMM90: Sommerhoff, CP, JA Nadel, CB Basbaum, and GH Caughey,"Neutrophil Elastase and Cathepsin G Stimulate Secretion from CulturedBovine Airway Gland Serous Cells", J Clin Invest (March 1990),85:682-689.

STAD86: Stader, J, SA Benson, and TJ Silhavy , "Kinetic analysis of lamBmutants suggests the signal sequence plays multiple roles in proteinexport", J Biol Chem (1986), 261(32)15075-80.

STAD89: Stader, J, LJ Gansheroff, and TJ Silhavy, "New suppressors ofsignal-sequence mutations, orlG. are linked tightly to the secE gene ofEscherichia coli", Genes & Develop (1989), 3-1045-1052.

STAT87: States, DJ, TE Creighton, CM Dobson, and M Karplus,"Conformations of intermediates in the folding of the pancreatic trypsininhibitor.", J Mol Biol (1987), 195(3)731-9.

STEI85: Steiner, BioScience Repts. (1985), 5:973ff.

STUB90: Stubbs, MT, B Laber, W Bode, R Huber, R Jerala, B Lenarcic, andV Turk, "The refined 2.4 Å X-ray crystal structure of recombinant humanstefin B in complex with the cysteine proteinase papain: a novel type ofproteinase inhibitor interaction", EMBO J (1990), 9(6)1939-47.

SUNX87: Sun, XP, H Takeuchi, Y Okano, and Y Nozawa, "Effects ofsynthetic omega-conotoxin GVIA (omega-CgTX GVIA) on the membrane calciumcurrent of an identifiable giant neurone, d-RPLN, of an African giantsnail (Achatina fulica Ferussac), measured under the voltage clampcondition", Comp Biochem Physiol [C], (1987), 87(2)363-6.

SUTC87a: Sutcliffe, MJ, I Haneef, D Carney, and TL Blundell, "Knowledgebased modelling of homologous proteins, part I: three-dimensionalframeworks derived from the simultaneous superposition of multiplestructures", Protein Engineering (1987), 1:377-384.

SUTC87b: Sutcliffe, MJ, FRF Hayes, and TL Blundell, "Knowledge basedmodelling of homologous proteins, part II: rules for the conformationsof substituted sidechains", Protein Engineering (1987), 1:385-392.

SVEN82: Svendsen, IB, "Amino Acid Sequence of Serine Protease InhibitorCI-1 from Barley. Homology with Barley Inhibitor CI-2, Potato InhibitorI, and Leech Elgin", Carlsberg Res Comm (1982), 47:45-53.

SWAI88: Swaim, MW, and SV Pizzo, "Modification of the tandem reactivecentres of human inter-α-trypsin inhibitor with butanedione andcisdichlorodiammineplatinum(II)", Biochem J (1988), 254:171-178.

TAKA74: Takahashi, H, S Iwanage, T Kitagawa, Y Hokama, and T Suzuki,"Snake venom proteinase inhibitors. II. Chemical structure of inhibitorII isolated from the venom of Russell's viper (Vipera russelli).", JBiochem (1974), 76:721-733.

TAKA85: Takao, T, N Tominaga, S Yoshimura, Y Shimonishi, S Hara, TInoue, and A Miyama, "Isolation, primary structure and synthesis ofheat-stable 1 enterotoxin produced by Yersinia enterocolitica", Eur JBiochem (1985), 152:199-206.

TAKE90: Takeda, T, GB Nair, K Suzuki, and Y Shimonishi, "Production of aMonoclonal Antibody to Vibrio cholerae Non-O1 Heat-Stable Enterotoxin(ST) Which is Cross-Reactive with Yersinia enterocolitica ST", Infectionand Immunity (1990), 58(9)2755-9.

TANK77: Tan, NH, and ET Kaiser, "Synthesis and Characterization of aPancreatic Trypsin Inhibitor Homologue and a Model Inhibitor",Biochemistry, (1977), 16:1531-41.

THER88: Theriault, NY, JB Carter, and SP Pulaski, "Optimization ofLigation Reaction Conditions in Gene Synthesis", BioTechniques (1988),6(5)470-473.

THOM83: Thomas, GJ, B Prescott, and LA Day, "Structure Similarity,Difference and Variability in the Filamentous Viruses fd, Ifl, Ike, Pfl,and Xf", J Mol Biol (1983), 165:321-56.

THOM85a: Thompson, MR, M Luttrell, G Overmann, RA Giannella "Biologicaland Immunological Characteristics of ¹²⁵ I-4Tyr and -18Tyr Escherichiacoli Heat-Stable Enterotoxin Species Purified by High-Performance LiquidChromatography", Analytical Biochem (1985), 148:26-36.

THOM85b: Thompson, MR, and RA Giannella, "Revised Amino Acid Sequencefor a Heat-Stable Enterotoxin Produced by an Escherichia coli Strain(18D) that is Pathogenic for Humans", Infection & Immunity (1985),47:834-36.

THOM86: Thompson, RC, and K Ohlsson, "Isolation, properties, andcomplete amino acid sequence of human secretory leukocyte proteaseinhibitor, a potent inhibitor or leukocyte elastase", Proc Natl Acad SciUSA (1986), 83:6692-96.

THOM88a: Thomas, GJ, Jr, B Prescott, SJ Opella, and LA Day, "SugarPucker and Phosphodiester Conformations in Viral Genomes of FilamentousBacteriophages: fd, Ifl, IKe, Pfl, Xf, and Pf3", Biochem (1988),27:4350-57.

THOR88: Thornton, JM, BL Sibinda, MS Edwards, and DJ Barlow, "Analysis,Design, and Modification of Loop Regions in Proteins.", BioEssays (?)SKG 3039 ??????

TOMM82: Tommassen, J, P van der Ley, A van der Ende, H Bergmans, and BLugtenberg, "Cloning of ompF, the Structural Gene for an Outer MembranePore Protein of E. coli K12: Physical Localization and Homology with thephoE Gene", Mol gen Genet (1982), 185:105-110.

TOMM85: Tommassen, J, P van der Ley, M van Zeijl, and M Agterberg,"Localization of functional domains in E. coli K-12 outer membraneporins", EMBO J (1985), 4(6)1583-7.

TRAB86: Traboni, C, R Cortese, "Sequence of a full length cDNA codingfor human protein HC (α₁ microglobulin)", Nuclelc Acids Res (1986),14(15)6340.

TRIA88: Trias, J, EY Rosenberg, and H Nikaido, "Specificity of theglucose channel formed by protein Dl of Pseudomonas aeruginosa", BiochimBiophys Acta (1988), 938:493-496.

TSCH86: Tschesche, H, H Wenzel, R Schmuck, and E Schnabel, "Homologuesof Aprotinin with, in place of lysine, other amino acids in position 15,process for their preparation and their use as medicaments", U.S. Pat.No. 4,595,674 (17 Jun 1986).

TSCH87: Tschesch, H, J Beckmann, A Mehlich, E Schnabel, E Truscheit, andHR Wenzel, "Semisynthetic engineering of proteinase inhibitorhomologues", Biochimica et Biophysica Acta (1987), 913:97-101.

VAND86: van der Ley, P, M Struyve, and J Tommassen, "Topology of outermembrane pore protein PhoE of Escherichia coli. Identification of cellsurface-exposed amino acids with the aid of monoclonal antibodies", JBiol Chem (1986), 261(26)12222-5.

VAND89: Vanderslcie, P, CS Craik, JA Nadel, GH Caughey, "MolecularCloning of Dog Mast Cell Tryptase and a Related Protease: StructuralEvidence of a Unique Mode of Serine Protease Activation", Biochem(1989), 28:4148-55.

VAND90: van der Werf, S, A Charbit, C Leclerc, V Mimic, J Ronco, MGirard, and M Hofnung, "Critical role of neighbouring sequences on theimmunogenicity of the C3 poliovirus neutralization epitope expressed atthe surface of recombinant bacteria", Vaccine (1990), 8(3)269-77.

VERS86a: Vershon, AK, K Blacker, and RT Sauer, "Mutagenesis of the ArcRepressor Using Synthetic Primers with Random Nucleotide Substitutions",pp243-256 in Protein Engineering. Applications in Science, Medicine, andIndustry, Academic Press, 1986.

VERS86b: Vershon, AK, JU Bowie, TM Karplus, and RT Sauer, "Isolation andAnalysis of Arc Repressor Mutants: Evidence for an Unusual Mechanism ofDNA Binding", pp302-311 in Proteins: Structure, Function, and Genetics,Alan R. Liss, Inc., 1986.

VINC72: Vincent &al, Biochem (1972), 11:2967ff.

VINC74: Vincent &al., Biochem (1974), 13:4205.

VITA84: Vita, C, D Dalzoppo, and A Fontana, "Independent Folding of theCarboxyl-Terminal Fragment 228-316 of Thermolysin", Biochemistry (1984),23:5512-5519.

VOGE86: Vogel, H, and F Jahnig, "Models for the structure of outermembrane proteins of E. coli derived from Raman spectroscopy andprediction methods", J Mol Biol (1986), 190:191-99.

VOND86: Vonderviszt, F, GY Matrai, and I Simon, "Characteristicsequential residue environment of amino acids in proteins", Int JPeptide Protein Res (1986), 27:483-92.

WACH79: Wachter, E, K Hochstrasser, G Bretzel, and S Heindl,"Kunitz-Type Proteinase Inhibitors Derived by Limited Proteolysis of theInter-α-trypsin Inhibitor, II. Characterization of a Second InhibitoryInactive Domain by Amino Acid Sequence Determination", Hoppe-Seyler ZPhysiol Chem (1979), 360:1297-1303.

WACH80: Wachter, E, K Deppner, and K Hochstrasser, "A New Kunitz-typeInhibitor from Bovine Serum, Amino Acid Sequence Determination.", FEBSLetters (1980), 119:58-62.

WAGN78: Wagner, G, K Wuthrich, and H Tschesche, "A HNuclear-Magnetic-Resonance Study of the Solution Conformation of theIsoinhibitor K from Helix pomatia.", Eur J Biochem (1978), 89:367-377.

WAGN79: Wanger, G, H Tschesche, and K Wuthrich, "The Influence ofLocalized Chemical Modifications of the Basic Pancreatic TrypsinInhibitor on Static and Dynamic Aspects of the Molecular Conformation inSolution", Eur J Biochem (1979), 95:239-248.

WANG87: Wagner, G, D Bruhwiler, and K Wuthrich, "Reinvestigation of thearomatic side-chains in the basic pancreatic trypsin inhibitor byheteronuclear two-dimensional nuclear magnetic resonance.", J Mol Biol(1987), 196(1)227-31.

WAIT83: Waite, JH, "Evidence for a repeating 3,4-dihydroxyphenylalanine-and hydroxyproline-containing decapeptide in the adhesive protein of themussel, Mytilus edulis L.", J Biol Chem (1983), 258(5)2911-5.

WAIT85: Waite, JH, TJ Housley, and ML Tanzer, "Peptide repeats in amussel glue protein: theme and variations.", Biochemistry (1985),24(19)5010-4.

WAIT86: Waite, JH, "Mussel glue from Mytilus californianus Conrad: acomparative study. ", J Comp Physiol [B] (1986), 156(4)491-6.

WATS87: Molecular Biology of the Gene, Fourth Edition, Watson, JD, NHHopkins, JW Roberts, JA Steitz, and AM Weiner, Benjamin/CummingsPublishing Company, Inc., Menlo Park, Calif., 1987.

WEBS78: Webster, RE, and JS Cashman, "Morphogenesis of the FilamentousSingle-stranded DNA Phages.", in The Single-Stranded DNA Phages,Denhardt, DT, D Dressler, and DS Ray editors, Cold Spring HarborLaboratory, 1978., p557-569.

WEHM89: Wehmeier, U, GA Sprenger, and JW Lengeler, "The use of lambdaplac-Mu hybrid phages in Klebsiella pneumoniae and the isolation ofstable Hfr strains", Mol Gen Genet (1989), 215(3)529-36.

WEIN83: Weinstock, GM, C ap Rhys, ML Berman, B Hampar, D Jackson, TJSilhavy, J Weisemann, and M Zweig, "Open reading frame expressionvectors: A general method for antigen production in Escherichia coliusing protein fusions to beta-galactosidase", Proc Natl Acad Sci USA(1983), 80:4432-4436.

WELL86: Wells, JA, and DB Powers, "In vivo Formation and Stability ofEngineered Disulfide Bonds in Subtilisin", Biol Chem (1986),261:6564-70.

WELL87a: Wells, JA, BC Cunningham, TP Graycar, and DA Estell,"Recruitment of substrate-specificity properties from one enzyme into arelated one by protein engineering", Proc Natl Acad Sci USA (1987),84:5167-71.

WELL87b: Wells, JA, DB Powers, RR Bott, TP Graycar, and DA Estell,"Designing substrate specificity by protein engineering of electrostaticinteractions", Proc Natl Acad Sci USA (1987), 84:1219-23.

WEMM83: Wemmer, D, and NR Kallenbach, Biochem (1983), 22:1901-6.

WENZ80: Wenzel, HR, and H Tschesche, Hoppe-Seyler Z Physiol Chem (1980),361:345.

WENZ81: Wenzel, HR, and H Tschesche, "`Chemical Mutation` by Amino AcidExchange in the Reactive Site of a Proteinase Inhibitor and Alterationof Its Inhibitor Specificity", Angew Chem Int Ed Engl (1981),20(3)295-6.

WETZ88: Wetzel, R, et al., Proc Natl Acad Sci USA (1988), 85:401-5.

WEWE87: Wewers, MD, MA Casolaro, SE Sellers, SC Swayze, KM McPhaul, JTWittes, and RG Crystal, "Replacement therapy for α-1-antitrypsindeficiency associated with emphysema", New Engl J Med (1987),316(17)1055-62.

WHAR86: Wharton, RP, The Binding Specificity Determinants of 434Repressor., Harvard U. PhD Thesis, 1986, University Microfilms, AnnArbor, Mich.

WIEC85: Wieczorek, M, J Otlewski, J Cook, K Parks, J Leluk, AWilimowska-Pelc, A Polanowski, T Wilusz, and L Laskowski, Jr, "TheSquash Family of Serine Protease Inhibitors. Amino Acid Sequences andassociation equilibrium constants of inhibitors from squash, summersquash, zucchini, and cucumber seeds", Biochem Biophys Res Comm (1985),126(2)646-652.

WILK84: Wilkinson, AJ, AR Fersht, DM Blow, P Carter, and G Winter, "Alarge increase in enzyme-substrate affinity by protein engineering.",Nature (1984), 307:187-188.

WINT87b: Winter, AJ, "Outer membrane proteins of Brucella", Ann InstPasteur Microbiol (1987), 138(1)87-9.

WLOD84: Wlodawer, A, J Walter, R Huber, and L Sjolin, "Structure ofbovine pancreatic trypsin inhibitor. Results of joint neutron and X-rayrefinement of crystal form II.", J Mol Biol (1984), 180(2)301-29.

WLOD87a: Wlodawer, A, J Nachman, GL Gilliland, W Gallagher, and CWoodward, "Structure of form III crystals of bovine pancreatic trypsininhibitor.", J Mol Biol (1987), 198(3)469-80.

WLOD87b: Wlodawer, A, J Deisenhofer, and R Huber, "Comparison of twohighly refined structures of bovine pancreatic trypsin inhibitor.", JMol Biol (1987), 193(1)145-56.

WOOD90: Woodward, SR, LJ Cruz, BM Olivera, and DR Hillyard, "Constantand hypervariable regions in conotoxin propeptides", EMBO J (1990),9:1015-1020.

WUNT88: Wun, T-C, KK Kretzmer, TJ Girard, JP Miletich, and GJ Broze, Jr,"Cloning and Characterization of a cDNA Coding for theLipoprotein-associated Coagulation Inhibitor Shows That It Consists ofThree Tandem Kunitz-type Inhibitory Domains", J Biol Chem (1988),263:6001-4.

YAGE87: Yager, TD, and PH von Hippel, "Transcription Elongation andTermination in E. coli", Volume 2, Chapter 76, p 1241-1275, Escherichiacoli and Salmonella typhimurium: Cellular and Molecular Biology,Neidhardt, FC, Editor-in-Chief, Amer Soc for Microbiology, Washington,DC, 1987.

° 10 YANI85: Yanisch-Perron, C, J Vieira, and J Messing, "Improved M13phage cloning vectors and host strains: nucleotide sequeices of theM13mp18 and pUC19 vectors", Gene, (1985), 33:103-119.

YOKO77: Yokosawa, H, and S-I Ishii, "Anhydrotrypsin: New Features inLigand Interactions Revealed by Affininty Chromatography and ThionineReplacement", J Biochem (1977), 81:647-56.

YOSH85: Yoshimura, S, H Ikemura, H Watanabe, S Aimoto, Y Shimonishi, SHara, T Takeda, T Miwatani, and Y Takeda, "Essential structure for fullenterotoxigenic activity of heat-stable enterotoxin produced byenterotoxigenic Escherichia coli", FEBS Lett (1985), 181:138-42.

ZAFA88: Zafaralla, GC, C Ramilo, WR Gray, R Karlstrom, BM Olivera, andLJ Cruz, "Phylogenetic specificity of cholinergic ligands: α-conotoxinSI", Biochemistry, (1988), 27(18)7102-5.

ZIMM82: Zimmermann, R, C Watts, and W Wickner, "The /Biosynthesis ofMembrane-bound M13 Coat Protein: Energetics and AssemblyIntermediates.", J Biol Chem (1982), 257:6529-6536.

ZOLL84: Zoller, MJ, and M Smith, "Oligonucleotide-Directed Mutagenesis:A Simple Method Using two Oligonucleotide Primers and a Single-StrandedDNA Template.", DNA (1984), 3(6)479-488.

    __________________________________________________________________________    SEQUENCE LISTING                                                              (1) GENERAL INFORMATION:                                                      (iii) NUMBER OF SEQUENCES: 121                                                (2) INFORMATION FOR SEQ ID NO:1:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:28 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:1:                                        PheXaaCysXaaXaaCysXaaXaa XaaPheXaaXaaXaaXaaXaaLeu                             151015                                                                        XaaXaaHisXaaXaaXaaHisXaaXaaXaaXaaXaa                                          2025                                                                          (2) INFORMATION FOR SEQ ID NO:2:                                               (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:28 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:2:                                        TyrXaaCysXaaXaaCysXaaXaaXaaPheXaaXaaXaaXaaXaaLeu                              1510 15                                                                       XaaXaaHisXaaXaaXaaHisXaaXaaXaaXaaXaa                                          2025                                                                          (2) INFORMATION FOR SEQ ID NO:3:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:29 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:3:                                        PheXaaCysXaaXaaXaaCysXaaXaaXaaPheXaaXaaXaaXaaXaa                              151015                                                                        LeuXaaXaaHisXaaXaaXaaHisXaaXaaXaaXaaXaa                                       20 25                                                                         (2) INFORMATION FOR SEQ ID NO:4:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:29 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:4:                                        TyrXaaCysXaaXaaXaaCysXaaXaaXaaPheXaaXaaXaaXaaXaa                              1 51015                                                                       LeuXaaXaaHisXaaXaaXaaHisXaaXaaXaaXaaXaa                                       2025                                                                          (2) INFORMATION FOR SEQ ID NO:5:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:30 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:5:                                        PheXaaCysXaaXaaXaaXaaCysXaaXaaXaaPheXaaXaaXaaXaa                              151015                                                                        XaaLeuXaaXaaHisXaa XaaXaaHisXaaXaaXaaXaaXaa                                   202530                                                                        (2) INFORMATION FOR SEQ ID NO:6:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:30 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:6:                                        TyrX aaCysXaaXaaXaaXaaCysXaaXaaXaaPheXaaXaaXaaXaa                             151015                                                                        XaaLeuXaaXaaHisXaaXaaXaaHisXaaXaaXaaXaaXaa                                    20 2530                                                                       (2) INFORMATION FOR SEQ ID NO:7:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:8 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:7:                                        XaaCysXaaXaaXaaXaaCysXaa                                                      15                                                                            (2) INFORMATION FOR SEQ ID NO:8:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:12 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:8:                                        GlyAsnXaaCysXaaXaaXaaXaaCysXaaSerGly                                          1510                                                                          (2) INFORMATION FOR SEQ ID NO:9:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:4 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:9:                                        MetLysLysSer                                                                  (2) INFORMATION FOR SEQ ID NO:10:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:5 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (i i) MOLECULE TYPE:protein                                                   (xi) SEQUENCE DESCRIPTION:SEQ ID NO:10:                                       GluGlyGlyGlySer                                                               15                                                                            (2) INFORMATION FOR SEQ ID NO:11:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:11:                                       GluGlyGlyGlySerGlySerSer SerLeuGlySerSerSerLeu                                151015                                                                        (2) INFORMATION FOR SEQ ID NO:12:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:4 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:12:                                       Met GlyAsnGly                                                                 1                                                                             (2) INFORMATION FOR SEQ ID NO:13:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:4 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:13:                                       SerAsnThrLeu                                                                  1                                                                             (2) INFORMATION FOR SEQ ID NO:14:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:4 amino acids                                                      (B ) TYPE:amino acid                                                          (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:14:                                       GlyGlyGlySer                                                                  1                                                                             (2) INFORMATION FOR SEQ ID NO:15:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:5 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:15:                                       GluGlyGlyGlyThr                                                                15                                                                           (2) INFORMATION FOR SEQ ID NO:16:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:5 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:16:                                       GlySerSerSerLeu                                                               15                                                                            (2) INFORMATION FOR SEQ ID NO:17:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:11 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:17:                                       GlyGlyGluGlyGlyGlySerAlaAlaGluGly                                             1510                                                                          (2) INFORMATION FOR SEQ ID NO:18:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 amino acids                                                      (B) TYPE:amino acid                                                          (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:18:                                       GluGlyGlyGlySerGlySerSerSerLeuGlySerSerSerLeu                                 151015                                                                        (2) INFORMATION FOR SEQ ID NO:19:                                              (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:10 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:19:                                       XaaXaaCysXaaXaaXaaXaaCysXaaXaa                                                1510                                                                          (2) INFORMATION FOR SEQ ID NO:20:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:13 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:20:                                       XaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCys                                       1510                                                                          (2) INFORMATION FOR SEQ ID NO:21:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:14 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:21:                                       XaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaa                                    1510                                                                          (2) INFORMATION FOR SEQ ID NO:22:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:15 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:22:                                       XaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaaXaa                                 151015                                                                        (2) INFORMATION FOR SEQ ID NO:23:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:16 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:23:                                       XaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaaXaaXaa                              15 1015                                                                       (2) INFORMATION FOR SEQ ID NO:24:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:17 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:24:                                       XaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaaXaaXaa                               151015                                                                       Xaa                                                                           (2) INFORMATION FOR SEQ ID NO:25:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:18 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:25:                                       XaaCysCysXaaXaaXaaCys XaaXaaXaaXaaXaaCysXaaXaaXaa                             151015                                                                        XaaXaa                                                                        (2) INFORMATION FOR SEQ ID NO:26:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:14 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                     (xi) SEQUENCE DESCRIPTION:SEQ ID NO:26:                                      XaaXaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCys                                    1510                                                                          (2) INFORMATION FOR SEQ ID NO:27:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:27:                                       XaaXaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaa                                 151015                                                                        (2) INFORMATION FOR SEQ ID NO:28:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:16 amino acids                                                     (B) TYPE:amino acid                                                            (D) TOPOLOGY:linear                                                          (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:28:                                       XaaXaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaaXaa                              151015                                                                        (2) INFORMATION FOR SEQ ID NO:29:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:17 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:29:                                       XaaXaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaaXaa                              151015                                                                         Xaa                                                                          (2) INFORMATION FOR SEQ ID NO:30:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:18 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:30:                                       XaaXaaCysCysXaaXaaXaaCysXaaXaaXaaXaaXaaCysXaaXaa                              15 1015                                                                       XaaXaa                                                                        (2) INFORMATION FOR SEQ ID NO:31:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:19 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:31:                                       XaaXaaCysCysXaaXaaXaaCysXaaXaaXaaXaa XaaCysXaaXaa                             151015                                                                        XaaXaaXaa                                                                     (2) INFORMATION FOR SEQ ID NO:32:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:22 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:32:                                       Xaa XaaCysCysXaaXaaXaaXaaXaaCysXaaXaaXaaXaaCysXaa                             151015                                                                        XaaXaaXaaCysCysXaa                                                            20                                                                            (2) INFORMATION FOR SEQ ID NO:33:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:22 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:33:                                       ArgAspCysCysThrProProLysLysCysLysAspArgGlnCysLys                              151015                                                                        Pr oGlnArgCysCysAla                                                           20                                                                            (2) INFORMATION FOR SEQ ID NO:34:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:24 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:34:                                       CysXaaXaaXaaXaaXaaXaaCysXaaXaaXaaXaaXaaXaa CysCys                             151015                                                                        XaaXaaCysXaaXaaXaaXaaCys                                                      20                                                                            (2) INFORMATION FOR SEQ ID NO:35:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:25 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                            (ii) MOLECULE TYPE:protein                                                   (xi) SEQUENCE DESCRIPTION:SEQ ID NO:35:                                       CysXaaXaaXaaXaaXaaXaaCysXaaXaaXaaXaaXaaXaaCysCys                              151015                                                                        XaaXaaXaaCysXaaXaaXaaXaaCys                                                    2025                                                                         (2) INFORMATION FOR SEQ ID NO:36:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:25 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:36:                                       CysXaaXaaXaaXaaXaaXaaCysXaaXaaXaaXaaXaaXaaCysCys                               151015                                                                       XaaXaaCysXaaXaaXaaXaaXaaCys                                                   2025                                                                          (2) INFORMATION FOR SEQ ID NO:37:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:26 amino acids                                                     (B) TYPE:amino acid                                                            (D) TOPOLOGY:linear                                                          (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:37:                                       CysXaaXaaXaaXaaXaaXaaCysXaaXaaXaaXaaXaaXaaCysCys                              151015                                                                        XaaXaaXaaCysXaaXaaXaaXaaXa aCys                                               2025                                                                          (2) INFORMATION FOR SEQ ID NO:38:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:26 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:38:                                       CysXaaXaaXaaXaaXaaXaaCysXaaXaaXaaXaaXaa XaaCysCys                             151015                                                                        XaaXaaCysXaaXaaXaaXaaXaaXaaCys                                                2025                                                                          (2) INFORMATION FOR SEQ ID NO:39:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 amino acids                                                      (B) TYPE:amino acid                                                          (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:39:                                       CysXaaXaaXaaXaaXaaXaaCysXaaXaaXaaXaaXaaXaaCysCys                              151015                                                                        XaaXaaXaaCy sXaaXaaXaaXaaXaaXaaCys                                            2025                                                                          (2) INFORMATION FOR SEQ ID NO:40:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:14 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:40:                                       HisAsnGlyMetXaaXaaXaa XaaXaaXaaHisAsnGlyCys                                   1510                                                                          (2) INFORMATION FOR SEQ ID NO:41:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:14 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:41:                                       CysAsnGlyMetXaaXaa XaaXaaXaaXaaHisAsnGlyHis                                   1510                                                                          (2) INFORMATION FOR SEQ ID NO:42:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:42:                                       HisGlyProXaaMetX aaXaaXaaXaaXaaXaaHisAsnGlyCys                                151015                                                                        (2) INFORMATION FOR SEQ ID NO:43:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:13 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:43:                                       SerAspGluAlaSerGlyCysHisTyrGlyValLeuThr                                       1510                                                                          (2) INFORMATION FOR SEQ ID NO:44:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:44:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysLysAla                              151015                                                                        ArgIleIleArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                               202530                                                                       PheValTyrGlyGlyCysArgAlaLysArgAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCy sGlyGlyAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:45:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:45:                                       ArgProAspPheCysLeuGluProProTyrThrGlyPro CysValAla                             151015                                                                        MetPheGlnArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        P heValTyrGlyGlyCysMetGlyAsnGlyAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:46:                                              (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:46:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValGly                              1510 15                                                                       PhePheSerArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheValTyrGlyGlyCysMetGlyAsnGlyAsnAsnPheLysS erAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:47:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                            (D) TOPOLOGY:linear                                                          (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:47:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValGly                              151015                                                                        PhePheGlnArgTyrPheTyrAsnAla LysAlaGlyLeuCysGlnThr                             202530                                                                        PheValTyrGlyGlyCysMetGlyAsnGlyAsnAsnPheLysSerAla                              3540 45                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:48:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:48:                                       ArgProAspPheCys LeuGluProProTyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              20 2530                                                                       PheValTyrGlyGlyCysMetGlyAsnGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                50 55                                                                         (2) INFORMATION FOR SEQ ID NO:49:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:49:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              1 51015                                                                       IlePheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheValTyrGlyGlyCysM etGlyAsnGlyAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:50:                                             (i) SEQUENCE CHARACTERISTICS:                                                 ( A) LENGTH:58 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:50:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        Ile PheLysArgLeuPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                             202530                                                                        PheValTyrGlyGlyCysMetGlyAsnGlyAsnAsnPheLysSerAla                              35 4045                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:51:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                     (xi) SEQUENCE DESCRIPTION:SEQ ID NO:51:                                      ArgProAspPheCysLeuGluProProTyrThrGlyProCysIleAla                              151015                                                                        PhePheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGln Thr                             202530                                                                        PheValTyrGlyGlyCysMetGlyAsnGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetAr gThrCysGlyGlyAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:52:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:52:                                       ArgProAspPheCysLeuGluProProTyrThr GlyProCysIleAla                             151015                                                                        PhePheGlnArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              2025 30                                                                       PheValTyrGlyGlyCysMetGlyAsnGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2 ) INFORMATION FOR SEQ ID NO:53:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:53:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysIleAla                              15 1015                                                                       LeuPheLysArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheValTyrGlyGlyCysMetGlyAsnGlyAsnAsnP heLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:54:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                            (D) TOPOLOGY:linear                                                          (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:54:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysMetGly                              151015                                                                        PheSerLysArgTyrPheTyr AsnAlaLysAlaGlyLeuCysGlnThr                             202530                                                                        PheValTyrGlyGlyCysArgAlaLysArgAsnAsnPheLysSerAla                              3540 45                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:55:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:55:                                       ArgProAsp PheCysLeuGluProProTyrThrGlyProCysMetAla                             151015                                                                        LeuPheLysArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              20 2530                                                                       PheValTyrGlyGlyCysArgAlaLysArgAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                 5055                                                                         (2) INFORMATION FOR SEQ ID NO:56:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:56:                                       ArgProAspPheCysLeuGluProProAsnThrGlyProCysPheAla                               151015                                                                       IleThrProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheValTyrGlyG lyCysArgAlaLysArgAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:57:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:58 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:57:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysMetAla                              15101 5                                                                       LeuPheGlnArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheValTyrGlyGlyCysArgAlaLysArgAsnAsnPheLysSerAla                               354045                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:58:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii ) MOLECULE TYPE:protein                                                   (xi) SEQUENCE DESCRIPTION:SEQ ID NO:58:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysMetAla                              151015                                                                        IleSerProArgTyrPheTyrAsnAlaLysAlaGlyLeu CysGlnThr                             202530                                                                        PheValTyrGlyGlyCysArgAlaLysArgAsnAsnPheLysSerAla                              354045                                                                        GluAspCy sMetArgThrCysGlyGlyAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:59:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:59:                                       ArgProAspPheCysLeuGluProPro TyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              2025 30                                                                       PheLeuTyrGlyGlyCysLysGlyLysGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                50 55                                                                         (2) INFORMATION FOR SEQ ID NO:60:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:60:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              15 1015                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheGluTyrGlyGlyCysTrpAlaLysGlyA snAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:61:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                      (B) TYPE:amino acid                                                          (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:61:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyr PheTyrAsnAlaLysAlaGlyLeuCysGlnThr                             202530                                                                        PheGlyTyrAlaGlyCysArgAlaLysGlyAsnAsnPheLysSerAla                              3540 45                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:62:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:62:                                       Arg ProAspPheCysLeuGluProProTyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                               202530                                                                       PheGluTyrGlyGlyCysHisAlaGluGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGl yAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:63:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:63:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysVal Ala                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheLeuT yrGlyGlyCysTrpAlaGlnGlyAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:64:                                             (i ) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:64:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              1510 15                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheArgTyrGlyGlyCysLeuAlaGluGlyAsnAsnPheLysSerAla                               354045                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:65:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                            (ii) MOLECULE TYPE:protein                                                   (xi) SEQUENCE DESCRIPTION:SEQ ID NO:65:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAla GlyLeuCysGlnThr                             202530                                                                        PheAspTyrGlyGlyCysHisAlaAspGlyAsnAsnPheLysSerAla                              354045                                                                        Gl uAspCysMetArgThrCysGlyGlyAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:66:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:66:                                       ArgProAspPheCysLeuGlu ProProTyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              2025 30                                                                       PheLysTyrGlyGlyCysLeuAlaHisGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                50 55                                                                         (2) INFORMATION FOR SEQ ID NO:67:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:67:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              15 1015                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheThrTyrGlyGlyCysTrpAlaAs nGlyAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:68:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                      (B) TYPE:amino acid                                                          (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:68:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProA rgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                             202530                                                                        PheAsnTyrGlyGlyCysGluGlyLysGlyAsnAsnPheLysSerAla                              35 4045                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:69:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:69:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                               202530                                                                       PheGlnTyrGlyGlyCysGluGlyTyrGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCys GlyGlyAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:70:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:70:                                       ArgProAspPheCysLeuGluProProTyrThrGlyPro CysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        Ph eGlnTyrGlyGlyCysLeuGlyGluGlyAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:71:                                              (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:71:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              1510 15                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheHisTyrGlyGlyCysTrpGlyGlnGlyAsnAsnPheLysSe rAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:72:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                            (D) TOPOLOGY:linear                                                          (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:72:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrPheTyrAsnAlaL ysAlaGlyLeuCysGlnThr                             202530                                                                        PheHisTyrGlyGlyCysTrpGlyGluGlyAsnAsnPheLysSerAla                              3540 45                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:73:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:73:                                       ArgProAspPheCys LeuGluProProTyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              20 2530                                                                       PheLysTyrGlyGlyCysTrpGlyLysGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                50 55                                                                         (2) INFORMATION FOR SEQ ID NO:74:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:74:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              1 51015                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheLysTyrGlyGlyCysHi sGlyAsnGlyAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:75:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A ) LENGTH:58 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:75:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetP heProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                             202530                                                                        PheProTyrGlyGlyCysTrpAlaLysGlyAsnAsnPheLysLeuAla                              35 4045                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:76:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                     (xi) SEQUENCE DESCRIPTION:SEQ ID NO:76:                                      ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnT hr                             202530                                                                        PheLysTyrGlyGlyCysTrpGlyHisGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArg ThrCysGlyGlyAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:77:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:77:                                       ArgProAspPheCysLeuGluProProTyrThr GlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              2025 30                                                                       PheAsnTyrGlyGlyCysTrpGlyLysGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:78:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:78:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              15 1015                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheThrTyrGlyGlyCysLeuGlyHisGlyAsnAsnPh eLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:79:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                            (D) TOPOLOGY:linear                                                          (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:79:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrPheTyrA snAlaLysAlaGlyLeuCysGlnThr                             202530                                                                        PheThrTyrGlyGlyCysLeuGlyTyrGlyAsnAsnPheLysSerAla                              3540 45                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:80:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:80:                                       ArgProAsp PheCysLeuGluProProTyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              20 2530                                                                       PheLysTyrGlyGlyCysTrpAlaGluGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                 5055                                                                         (2) INFORMATION FOR SEQ ID NO:81:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:81:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              1 51015                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheGlyTyrGlyGl yCysTrpGlyGluGlyAsnAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:82:                                             (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:58 amino acids                                                    (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:82:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheGluTyrGlyGlyCysTrpAlaAsnGlyAsnAsnPheLysSerAla                               354045                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:83:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:83:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuC ysGlnThr                             202530                                                                        PheValTyrGlyGlyCysHisGlyAspGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCys MetArgThrCysGlyGlyAla                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:84:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:84:                                       ArgProAspPheCysLeuGluProPro TyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              2025 30                                                                       PheMetTyrGlyGlyCysGlnGlyLysGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                505 5                                                                         (2) INFORMATION FOR SEQ ID NO:85:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:85:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              15 1015                                                                       MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                              202530                                                                        PheTyrTyrGlyGlyCysTrpAlaLysGlyAs nAsnPheLysSerAla                             354045                                                                        GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:86:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                      (B) TYPE:amino acid                                                          (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:86:                                       ArgProAspPheCysLeuGluProProTyrThrGlyProCysValAla                              151015                                                                        MetPheProArgTyrP heTyrAsnAlaLysAlaGlyLeuCysGlnThr                             202530                                                                        PheMetTyrGlyGlyCysTrpGlyAspGlyAsnAsnPheLysSerAla                              3540 45                                                                       GluAspCysMetArgThrCysGlyGlyAla                                                5055                                                                          (2) INFORMATION FOR SEQ ID NO:87:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:58 amino acids                                                     (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:87:                                       Arg ProAspPheCysLeuGluProProTyrThrGlyProCysValAla                             151015                                                                        MetPheProArgTyrPheTyrAsnAlaLysAlaGlyLeuCysGlnThr                               202530                                                                       PheThrTyrGlyGlyCysHisGlyAsnGlyAsnAsnPheLysSerAla                              354045                                                                        GluAspCysMetArgThrCysGlyGly Ala                                               5055                                                                          (2) INFORMATION FOR SEQ ID NO:88:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:6 amino acids                                                      (B) TYPE:amino acid                                                           (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:protein                                                    (xi) SEQUENCE DESCRIPTION:SEQ ID NO:88:                                       XaaXaaXaaXaaXaaXaa                                                            15                                                                            (2) INFORMATION FOR SEQ ID NO:89:                                              (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:24 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:89:                                       NNTTGTNNTNNGNNGNNTTGTNNT24                                                    XaaCysXaaXaaXaaXaaCysXaa                                                      15                                                                            (2 ) INFORMATION FOR SEQ ID NO:90:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:13 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:other nucleic acid                                         (A) DESCRIPTION:synthetic DNA fragment                                        (xi) SEQUENCE DESCRIPTION:SEQ ID NO:90:                                       CCGTCGAATCCGC13                                                               (2) INFORMATION FOR SEQ ID NO:91:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:13 base pairs                                                       (B) TYPE:nucleic acid                                                        (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:other nucleic acid                                         (A) DESCRIPTION:synthetic DNA fragment                                        (xi) SEQUENCE DESCRIPTION:SEQ ID NO:91:                                       GCGGATTTGACGG13                                                               (2) INFORMATION FOR SEQ ID NO:92:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:16 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:single                                                        (D) TOPOLOGY:linear                                                          (ii) MOLECULE TYPE:other nucleic acid                                         (A) DESCRIPTION:synthetic DNA fragment                                        (xi) SEQUENCE DESCRIPTION:SEQ ID NO:92:                                       CGTAACCTCGTCATTA16                                                            (2) INFORMATION FOR SEQ ID NO:93:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:16 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:single                                                       (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:other nucleic acid                                         (A) DESCRIPTION:synthetic DNA fragment                                        (xi) SEQUENCE DESCRIPTION:SEQ ID NO:93:                                       CCGTAGGTACCTACGG16                                                            (2) INFORMATION FOR SEQ ID NO:94:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:other nucleic acid                                         (A) DESCRIPTION:synthetic DNA fragment                                        (xi) SEQUENCE DESCRIPTION:SEQ ID NO:94:                                       CACGGCTATTACGGT 15                                                            (2) INFORMATION FOR SEQ ID NO:95:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:12 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:linear                                                           (ii) MOLECULE TYPE:other nucleic acid                                         (A) DESCRIPTION:synthetic DNA fragment                                        (xi) SEQUENCE DESCRIPTION:SEQ ID NO:95:                                       ACCGTAATAGCC12                                                                (2) INFORMATION FOR SEQ ID NO:96:                                             (i) SEQUENCE CHARACTERISTICS:                                                 ( A) LENGTH:20 base pairs                                                     (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:96:                                       ACTTCCTCATGAAAAAGTCT20                                                        ThrSerSer                                                                     1                                                                             (2) INFORMATION FOR SEQ ID NO:97:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:20 base pairs                                                      (B) TYPE:nucleic acid                                                          (C) STRANDEDNESS:double                                                      (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:97:                                       ACTTCCTCATGAAAAAGTCT20                                                        MetLysLysSer                                                                  1                                                                             (2) INFORMATION FOR SEQ ID NO:98:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:20 base pairs                                                      (B) TYPE:nucleic acid                                                         ( C) STRANDEDNESS:double                                                      (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:98:                                       ACTTCCAGCTGAAAAAGTCT20                                                        ThrSerSer                                                                     1                                                                             (2) INFORMATION FOR SEQ ID NO:99:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:20 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:99:                                       ACTTCCAGCTGAAAAAGTCT20                                                        MetLysLysSer                                                                  1                                                                             (2) INFORMATION FOR SEQ ID NO:100:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:16 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                 (xi) SEQUENCE DESCRIPTION:SEQ ID NO:100:                                     CGAGGGAGGAGGATCC16                                                            GluGlyGlyGlySer                                                               15                                                                            (2) INFORMATION FOR SEQ ID NO:101:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:16 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:101:                                      CGGATCCTCCTCCCTC16                                                            GlySerSerSerLeu                                                               15                                                                            (2) INFORMATION FOR SEQ ID NO:102:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:33 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:102:                                       GGTGGCGAGGGAGGAGGATCCGCCGCTGAAGGT33                                          GlyGlyGluGlyGlyGlySerAlaAlaGluGly                                             1510                                                                          (2) INFORMATION FOR SEQ ID NO:103:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:21 base pairs                                                      (B) TYPE:nucleic acid                                                          (C) STRANDEDNESS:double                                                      (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:103:                                      GGCGGATCCTCCTCCCTCGCC21                                                       GlyGlySerSerSerLeuAla                                                         15                                                                            (2) INFORMATION FOR SEQ ID NO:104:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:20 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:104:                                      GCGAGGGAGGAGGATCCGCC20                                                        GluGlyGlyGlySerAla                                                            15                                                                            (2) INFORMATION FOR SEQ ID NO:105:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:52 base pairs                                                       (B) TYPE:nucleic acid                                                        (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:105:                                      GGCGAGGGAGGAGGATCCGGATCCTCCTCCCTCGGATCCTCCTCC45                               GlyGluGlyGlyGlySerGlySerSerSerLeuGlySerSerSer                                 1 51015                                                                       CTCGCCC52                                                                     LeuAla                                                                        (2) INFORMATION FOR SEQ ID NO:106:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:18 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:106:                                      RVTVYTRRSVHGVHGRMG18                                                          XaaXaaXaaXaaXaaXaa                                                            15                                                                            (2) INFORMATION FOR SEQ ID NO:107:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:12 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:107:                                       VYTVNTNNKVWG12                                                               XaaXaaXaaXaa                                                                  1                                                                             (2) INFORMATION FOR SEQ ID NO:108:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:108:                                      CCTTGCGTGGCTATGTTCCAACGCTAT 27                                                ProCysValAlaMetPheGlnArgTyr                                                   15                                                                            (2) INFORMATION FOR SEQ ID NO:109:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:109:                                      CCTTGCGTCGGT TTCTTCTCACGCTAT27                                                ProCysValGlyPhePheSerArgTyr                                                   15                                                                            (2) INFORMATION FOR SEQ ID NO:110:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:110:                                      CCTTGCGTCGGTTTCTTCCAACGCTAT27                                                 ProCysValGlyPhePheGlnArgTyr                                                   15                                                                            (2) INFORMATION FOR SEQ ID NO:111:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                          (ii) MOLECULE TYPE:genomic DNA                                               (xi) SEQUENCE DESCRIPTION:SEQ ID NO:111:                                      CCTTGCGTCGCTATGTTCCCACGCTAT27                                                 ProCysValAlaMetPheProArgTyr                                                   15                                                                            (2) INFORMATION FOR SEQ ID NO:112:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                        (D) TOPOLOGY:circular                                                        (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:112:                                      CCTTGCGTCGCTATCTTCCCACGCTAT27                                                 ProCysValAlaIlePheProArgTyr                                                   15                                                                            (2) INFORMATION FOR SEQ ID NO:113:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 base pairs                                                      ( B) TYPE:nucleic acid                                                        (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:113:                                      CCTTGCGTCGCTATCTTCAAACGCTCT27                                                 ProCysValAlaIlePheLysArgTyr                                                   15                                                                            (2) INFORMATION FOR SEQ ID NO:114:                                            (i) SEQUENCE CHARACTERISTICS:                                                  (A) LENGTH:27 base pairs                                                     (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:114:                                      CCTTGCATCGCTTTCTTCCCACGCTAT27                                                 ProCysIleAlaPhePheProArgTyr                                                   15                                                                            (2) INFORMATION FOR SEQ ID NO:115:                                             (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:27 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:115:                                      CCTTGCATCGCTTTCTTCCAACGCTAT27                                                 ProCysIleAlaPhePheGlnArgTyr                                                   1 5                                                                           (2) INFORMATION FOR SEQ ID NO:116:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:27 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:116:                                      CCTTGCATCGCTTTGTTCAAACGCTAT27                                                 ProCysIleAlaLeuPheLysA rgTyr                                                  15                                                                            (2) INFORMATION FOR SEQ ID NO:117:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:117:                                      ATGGGTTTCTCCAAA15                                                             MetGlyPheSerLys                                                               1 5                                                                           (2) INFORMATION FOR SEQ ID NO:118:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:118:                                      ATGGCTTTGTTCAAA15                                                             MetAlaLeuPheLys                                                               1 5                                                                           (2) INFORMATION FOR SEQ ID NO:119:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:119:                                      TTCGCTATCACCCCA15                                                             PheAlaIleThrPro                                                               15                                                                            (2) INFORMATION FOR SEQ ID NO:120:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH:15 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:120:                                      ATGGCTTTGTTCCAA15                                                             MetAlaLeuPheGln                                                               15                                                                            (2) INFORMATION FOR SEQ ID NO:121:                                             (i) SEQUENCE CHARACTERISTICS:                                                (A) LENGTH:15 base pairs                                                      (B) TYPE:nucleic acid                                                         (C) STRANDEDNESS:double                                                       (D) TOPOLOGY:circular                                                         (ii) MOLECULE TYPE:genomic DNA                                                (xi) SEQUENCE DESCRIPTION:SEQ ID NO:121:                                      ATGGCTATCTCCCCA15                                                             MetAlaIleSerPro                                                               15                                                                        

We claim:
 1. A method of obtaining a nucleic acid encoding a bindingprotein having a proteinaceous binding domain that binds a predeterminedtarget material, said target being a substance other than an antibodywith an exposed antigen-combining site, comprising:a) preparing avariegated population of amplifiable genetic packages, said geneticpackages being selected from the group consisting of cells, spores andviruses, each said genetic package being genetically alterable andhaving an outer surface including a genetically determined outer surfaceprotein, each package including a first nucleic acid construct codingfor a chimeric potential binding protein, each said chimeric proteincomprising and each said construct comprising DNA encoding (i) apotential binding domain which is a mutant of a predetermined domain ofa predetermined parental protein other than a single chain antibody, and(ii) an outer surface transport signal for obtaining the display of thepotential binding domain on the outer surface of the genetic package,the expression of which construct results in the display of saidchimeric potential binding protein and its potential binding domain onthe outer surface of said genetic package: and wherein said variegatedpopulation of genetic packages collectively display a plurality ofdifferent potential binding domains, the differentiation among saidplurality of different potential binding domains occurring through theat least partially random variation of one or more predetermined aminoacid positions of said parental binding domain to randomly obtain ateach said position an amino acid belonging to a predetermined set of twoor more amino acids, the amino acids of said set occurring at saidposition in statistically predetermined expected proportions, saidgenetic packages being amplifiable in cell culture and separable on thebasis of the potential binding domain displayed thereon, b) causing theexpression of said chimeric potential binding proteins and the displayof said potential binding domains on the outer surface of said packages;c) contacting said packages with the predetermined target material suchthat said potential binding domains and the target material mayinteract; d) separating packages displaying a potential binding domainthat binds the target material from packages that do not so bind, and e)recovering at least one package displaying on its outer surface achimeric binding protein comprising a successful binding domain (SBD)which bound said target, said package comprising nucleic acid encodingsaid successful binding domain, and amplifying said SBD-encoding nucleicacid in vivo or in vitro.
 2. The method of claim 1 wherein the potentialbinding proteins are intergeneric chimeric proteins.
 3. The method ofclaim 1 wherein said population is characterized by the display of atleast 10⁵ different potential binding domains.
 4. The method of claim 1wherein the parental protein is a single domain protein.
 5. The methodof claim 1 wherein, for any potentially encoded potential bindingdomain, the probability that it will be displayed by at least onepackage in said population is at least 50%.
 6. The method of claim 5wherein, for any potentially encoded potential binding domain, theprobability that it will be displayed by at least one package in saidpopulation is at least 90%.
 7. The method of claim 1, wherein the methodfurther comprises (i) isolating from the first nucleic acid construct ofa package bearing a successful binding domain, a nucleic acid fragmentconsisting essentially of DNA encoding said successful binding domain,or (ii) determining enough of the DNA sequence of the first nucleic acidconstruct of (i) above to deduce the amino acid sequence of thesuccessful binding domain and then preparing a second nucleic acidconstruct comprising DNA encoding the successful binding domain, saidfirst and second nucleic acid constructs encoding different proteins. 8.The method of claim 7 wherein the second nucleic acid construct encodesa protein consisting essentially of said successful binding domain. 9.The method of claim 1 wherein, in said step (a), the differentiationamong said potential binding domains of said variegated population islimited to no more than about 20 predetermined amino acid residues ofsaid sequence and where the permissible variation at said alterableresidue positions is also predetermined.
 10. The method of claim 9wherein the initially chosen parental binding domain has a melting pointof at least 50° C.
 11. The method of claim 9 wherein the initiallychosen parental potential binding domain is selected from the groupconsisting of (a) binding domain of bovine pancreatic trypsin inhibitor,crambin, Cucurbita maxima trypsin inhibitor III, heatstable enterotoxinof Excherichia coli, α Conotoxin GI, μ Conotroxin GIII, ω Conotoxin GIV,apamin, charybdotoxin, secretory leukocyte protease inhibitor, cystatin,eglin, barley protease inhibitor, ovomucoid, T4 lysozyme, hen egg whitelysozyme, ribonuclease, azurin, tumor necrosis factor, and CD4, and (b)domains at least substantially homologous with any of the foregoingdomains which have a melting point of at least 50° C.
 12. The method ofclaim 9, further comprising determining which amino acid residues ofsaid parental binding domain are likely to lie on the surface of saiddomain and limiting the variegation to codons encoding such amino acids.13. The method of claim 9, said target material comprising one or morediscrete molecules, said parental potential binding domain beingcharacterized as a sequence of amino acids, further comprisingidentifying an interaction set of amino acids which are on the surfaceof the parental potential binding domain and which can allsimultaneously touch a single molecule of the target material, andobtaining a variegated population wherein for each amino acid residue insaid interaction set there is at least one potential binding domainwherein a different amino acid is substituted therefor, said interactionset comprising from about eight to about sixteen residues.
 14. Themethod of claim 9 wherein the initially chosen parental binding domainis not a binding domain of a binding protein for the predeterminedtarget.
 15. The method of claim 9 wherein the initially chosen parentalbinding domain contains no more than 30 residues and at least 2disulfides.
 16. The method of claim 9 wherein the initially chosenparental binding domain contains no more than 60 residues and at least 3disulfides.
 17. The method of claim 9 wherein the initially chosenparental binding domain contains no more than 80 residues and at least 4disulfides.
 18. The method of claim 9, further comprising the steps of:(f) determining the amino acid sequence of a successful binding domainand (g) preparing a new variegated population of replicable geneticpackages according to said step (a), the parental binding domain for thepotential binding domains of said new packages being a successfulbinding domain whose sequence was determined in step (f), and repeatingsteps (b)-(e) with said new population.
 19. The method of claim 18wherein a potential binding domain is considered successful only if ithas at least a certain predetermined degree of affinity for targetmaterial, the required degree of affinity being increased for each newvariegated population, and wherein each new variegated populationdislays its parental binding domain in detectable amounts.
 20. Themethod of claim 1 wherein the replicable genetic package is a bacterialcell and said DNA construct further comprises a periplasmic secretionsignal sequence.
 21. The method of claim 20 wherein the bacterial cellis selected from the group consisting of strains of Escherichia coli,Salmonella typhimurium, Pseudomonas aeruginosa, Klebsiella pneumonia,Neisseria gonorrhoeae, and Bacillus subtilis.
 22. The method of claim 21wherein the outer surface transport signal is derived from a bacterialouter surface protein selected from the group consisting of the lamBprotein, OmpA, OmpC, OmpF, Phospholipase A, and pilin, or an assemblablesegment thereof.
 23. The method of claim 22 wherein the chimeric surfaceprotein substantially corresponds to LamB with the foreign potentialbinding domain inserted at
 153. 24. The method of claim 22 wherein thechimeric potential binding protein substantially corresponds to thefirst 153 amino acids of LamB fused in frame to the potential bindingdomain.
 25. The method of claim 1 wherein the replicable genetic packageis a bacterial spore.
 26. The method of claim 25 wherein the bacterialspore is a Bacillus endospore.
 27. The method of claim 26 wherein thereplicable genetic package is an endospore of a strain of B. subtilis.28. The method of claim 27 wherein the outer surface transport signal isthe cotA, cotB, cotC or cotD protein or an assemblable segment thereof.29. A method of producing a binding protein which binds a predeterminedtarget material which comprises:(i) obtaining, by the method of claim 1,a first nucleic acid construct encoding a chimeric binding proteinhaving a binding domain which binds the predetermined target material,and (ii) producing either said chimeric binding protein, or a secondbinding protein having essentially the same binding domain.
 30. Themethod of claim 29 wherein the sequence of the binding domain isdetermined by sequencing at least a portion of either said first nucleicacid construct or said chimeric binding protein, and the determinedsequence is then used to guide production of said second bindingprotein.
 31. The method of claim 30 wherein said second binding proteinis produced by expressing a second DNA construct derived at least inpart from the first nucleic acid construct.
 32. The method of claim 29wherein the second binding protein is produced by recombinant DNAtechniques.
 33. The method of claim 29 wherein the second bindingprotein is produced by nonbiological peptide synthesis techniques. 34.The method of claim 29 wherein the second binding protein consistsessentially of said successful binding domain.
 35. The method o claim 1wherein the genetic package is a single-stranded DNA bacteriophage otherthan bacteriophage lambda and the package is replicated in a bacterialhost cell.
 36. The method of claim 35 wherein said construct furthercomprises a cytoplasmic secretion signal sequence which codes for asignal peptide which directs the immediate expression product to theinner membrane of the bacterial host cell infected by said phage whereit is processed to remove said signal peptide, yielding a maturechimeric protein comprising the potential binding domain and at least aportion of a coat protein of the phage, said chimeric protein beingassembled with wild-type coat protein into the phage coat so that saidphage displays the potential binding domain on the surface of its coat.37. The method of claim 36 wherein the secretion signal sequence isderived from a first gene, and the nucleic acid sequence encoding theouter surface transport signal is derived from a second gene, the firstand second genes being different.
 38. The method of claim 36 wherein thesecretion signal is encoded by a signal sequence selected from the groupconsisting of the signal sequences of the phoA, bla and geneIII genes.39. The method of claim 35 wherein the replicable genetic package is afilamentous phage.
 40. The method of claim 39 wherein the outer surfacetransport signal is provided by the major coat protein of a filamentousphage or a assemblable fragment thereof.
 41. The method of claim 39wherein the outer surface transport signal is provided by the gene IIIprotein of a filamentous phage or an assemblable fragment thereof. 42.The method of claim 1 wherein the population of replicable geneticpackages of step (a) is obtained by:i) preparing a variegated populationof DNA inserts, said inserts comprising a plurality of variegatedcodons, which collectively encode a plurality of different potentialbinding domains, and ii) incorporating the resulting population of DNAinserts into the chosen replicable genetic packages to produce avariegated population of replicable genetic packages.
 43. The method ofclaim 42 in which at least one variegated codon is a simply variegatedcodon selected from the group consisting of NNT, NNG, RNG, RMG, VNT,RRS, and SNT.
 44. The method of claim 42 wherein none of the variegatedcodons is a simply variegated codon selected from the group consistingof NNN, NNK and NNS.
 45. The method of claim 42 wherein for at least onevariegated codon the ratio of amino acids encoded to possibletrinucleotide sequences is at least 2:3.
 46. The method of claim 42wherein for at least one variegated codon there are at least fourequally most-favored amino acids.
 47. The method of claim 42 wherein theratio of amino acid sequences encoded to possible polynucleotidesequences is at least 1:3.
 48. The method of claim 42 in which at leastone variegated codon is a complexly variegated codon.
 49. The method ofclaim 48 in which the complexly variegated codon is prepared so as toyield a ratio of most favored amino acid to least favored amino acidwhich is less than 2.6.
 50. The method of claim 48 wherein thedistribution of nucleotides incorporated at said complexly variegatedcodon is further chosen to yield the largest value for the quantity. 51.The method of claim 50 wherein the distribution of nucleotidesincorporated at said variegated codon is chosen to yield substantiallyequal abundances of acidic and basic amino acids.
 52. The method ofclaim 50 wherein at least one variegated codon provides at least tendifferent amino acids at not less than 5% abundance.
 53. The method ofclaim 1, said potential binding domain being a mini-protein sequence ofless than about sixty amino acids and having at least one intrachaincovalent crosslink between a first amino acid position and a secondamino acid position thereof, the amino acids at said first and secondpositions being invariant in all of the chimeric proteins displayed bysaid population.
 54. The method of claim 53 wherein the crosslink is adisulfide bond and the the amino acids at the first and second aminoacid positions are cysteines.
 55. The method of claim 54 in which themini-protein domain has a single disulfide bond and the span of the bondis not more than nine amino acid residues.
 56. The method of claim 54wherein the mini-protein domain has a disulfide bond which bridges asequence of amino acids which under affinity separation conditionscollectively assume a hairpin supersecondary structure.
 57. The methodof claim 56 wherein the hairpin secondary structure is selected form thegroup consisting of (a) on α helix, a turn, and a β strand; (b) an αhelix, a turn, and an α helix, and (c) a β strand, turn, and a β strand.58. The method of claim 54 wherein the mini-protein domain comprises aplurality of intrachain disulfide bonds.
 59. The method of claim 58wherein the mini-protein domain substantially corresponds in sequence toa mini-protein selected from the group consisting of Escherichia coliheat stable toxin I (ST_(A)), the bee venom apamin, or a squash-seedtrypsin inhibitor, the scorpion toxin, charybdotoxin and secretoryleukocyte protease inhibitor.
 60. The method of claim 58 wherein themini-protein domain has two disulfide bonds having a connectivitypattern of 1-3, 2-4.
 61. The method of claim 60 wherein the mini-proteindomain substantially corresponds in sequence to an α-conotoxin.
 62. Themethod of claim 58 wherein the mini-protein domain has three disulfidebonds having a connectivity patter of 1-4, 2-5, 3-6.
 63. The method ofclaim 62 wherein the mini-protein domain substantially corresponds insequence to a mu- or omegaconotoxin.
 64. In a process for developingnovel binding proteins, other than single chain antibodies, with adesired binding activity against a particular target material, otherthan the antigen-binding domains of antibodies, by mutagenesis of a geneencoding a known protein other than a single chain antibody, theimprovement comprising displaying a proteinaceous potential bindingdomain on the outer surface of an amplifiable genetic package selectedform the group consisting of cells, spores and viruses, each saidgenetic package being genetically alterable, said potential bindingdomain not being natively associated with the outer surface of aidpackage, said package containing the gene encoding said binding domainand means for directing said domain to the outer surface of saidpackage, contacting the package with the target material, anddetermining whether the package displaying the potential binding domainbinds to said target material.
 65. A variegated population of replicablegenetic packages, each package including a nucleic acid construct codingfor a chimeric potential binding protein, each said construct comprisingDNA encoding (i) a potential binding domain which is a mutant of apredetermined parental binding domain, and (ii) an outer surfacetransport signal for obtaining the display of the potential bindingdomain on the outer surface of the genetic package, wherein said initialbinding domain is not a single chain antibody and is not identical to orsubstantially homologous with a binding domain natively associated withsaid transport signal, and wherein said variegated population of geneticpackages collectively display a plurality of different potential bindingdomains, the differentiation among said plurality of different potentialbinding domains occurring through the at least partially randomvariation of one or more predetermined amino acid positions of saidparental binding domain to randomly obtain at each said position anamino acid belonging to a predetermined set of two or more amino acids,the amino acids of said set occurring at said position in predeterminedexpected proportions.
 66. A variegated population of DNA moleculesencoding chimeric binding proteins, each said chimeric binding proteincomprising (i) a binding domain, and (ii) at least a segment of an outersurface protein of a cell or virus, said segment acting to cause thedisplay of the chimeric binding protein or a processed form thereof onthe outer surface of the cell or virus, said binding domain beingcapable of binding to a target material to which said outer surfaceprotein is not capable of preferentially binding, wherein saidvariegated population of DNA molecules encode chimeric binding proteinswhich collectively include a plurality of different binding domains, thedifferentiation among said plurality of different potential bindingdomains occurring through the at least partially random variation of oneor more predetermined amino acid positions thereof to randomly obtain ateach said position an amino acid belonging to a predetermined set of twoor more amino acids, the amino acids of said set occurring at saidposition in predetermined expected proportions.